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ABSTRACT 


This  thesis  examines  a  number  of  underwater  acoustic  signals  and  the  problem  of  classifying 
these  signals  using  a  back-propagation  neural  network.  The  neural  network  classifies  the  signals 
based  upon  features  extracted  fi-om  the  original  signals.  The  effect  on  classification  by  using  an 
adaptive  line  enhancer  for  noise  reduction  is  explored.  Two  feature  extraction  methods  have  been 
implemented;  modeling  by  an  autoregressive  technique  using  the  reduced-rank  covariance  method, 
and  the  discrete  wavelet  transformation.  Both  orthonormal  and  non-orthonormal  transforms  are 
considered  in  this  study. 
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L  INTRODUCTION 


A.  SCOPE  OF  STUDY 

The  United  States  Navy’s  undersea  acoustic  surveillance  systems  were  used  during  the 
Cold  War  to  detect  and  track  enemy  submarine  activities.  These  systems  consist  of  both  fixed  and 
mobile  hydrophone  arrays.  Many  aspects  of  these  arrays  are  now  declassified  and  are  being  used 
by  geophysicists  to  monitor  undersea  earthquakes  and  volcanoes.  The  undersea  SOund 
Surveillance  System  (SOSUS),  array  out  of  Whidbey  Island  Washington  is  of  interest  to  this 
thesis.  It  consists  of  fixed  hydrophones  mounted  at  the  approximate  depth  of  the  deep  sound 
channel.  These  hydrophones  are  very  sensitive  and  have  the  ability  to  pick  up  very  low  frequency 
sounds.  [13  p.  3] 

Earthquakes  have  associated  with  them  three  phases,  representing  three  separate  types  of 
energy  release.  First  is  the  primary  phase,  or  p-phase.  This  phase  consists  of  energy  that  is 
transmitted  as  compressed  shock  waves  within  the  earth’s  crust.  The  secondary,  or  s-phase, 
consists  of  transverse  shock  waves  also  being  transmitted  through  the  earth’s  crust.  The  last 
phase,  the  tertiary  or  t-phase,  is  only  associated  with  underwater  earthquakes.  The  t-phase  consists 
of  acoustic  energy  that  is  transmitted  into  the  water  column  from  the  shaking  ocean  floor.  The 
definition  has  been  expanded  recently  to  include  any  low-frequency  seismic  event  that  transmits 
acoustic  energy  into  the  water  column.  [13  p.  4] 

The  National  Oceanic  and  Atmospheric  Administration  (NOAA),  Pacific  Marine 
Environmental  Laboratory  is  conducting  a  study  of  underwater  geological  processes,  such  as 
earthquakes  and  volcanic  activities  using  the  Whidbey  Island  SOSUS  array.  This  undCTsea 
surveillance  system  also  has  great  potential  for  use  in  tracking  and  estimating  populations  of 
marine  mammals.  The  purpose  of  this  thesis  is  to  investigate  classification  procedures  which 
would  allow  for  proper  separation  of  geological  processes  and  biological  signals,  and  would 
classify  the  various  biological  signals  for  further  biological  studies.  A  Neural  Network  (NN) 
configuration  is  chosen  in  this  study  to  automate  the  classification  process.  NN  have  great 
potentials  in  automatic  classification  problems,  as  they  can  learn  through  examples  and  do  not 
require  a  precise  mathematical  model  for  the  signal  characteristics.  However,  NN  implementations 
require  the  set  of  training  data  to  be  representative  of  the  various  signals  to  be  classified  to  have 
good  performance.  Thus,  the  success  of  any  classification  scheme  depends  on  a  judicious  choice  of 
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unique  features  that  allows  the  classifier  to  discriminate  between  each  class  of  interest.  This  thesis 
explores  the  use  of  autoregressive  (AR)  modeling  and  wavelet  decomposition  as  feature  extraction 
techniques.  In  addition,  we  investigate  the  application  of  an  adaptive  line  enhancer  to  de-noise  the 
underwater  data.  The  AR  coefficients  used  as  NN  inputs  are  computed  using  a  reduced-rank 
co\-ariance  method.  This  technique  combines  traditional  covariance  method  and  the  singular  value 
decomposition  to  reduce  the  effect  of  noise  in  the  signal  model.  Next,  orthogonal  and  non- 
orthogonal  implementations  of  the  discrete  wavelet  transform  are  chosen  as  feature  extractors.  The 
orthogonal  decomposition  uses  two  different  wavelet  bases;  Symmlet  8,  and  Coiflet  3  coefficients. 
The  .A-Trous  implementation  of  the  non-orthogonal  decomposition  uses  a  modified  Morlet  wavelet. 
This  classification  study  uses  5  different  species  of  whale  (Sperm.  Killer,  Humpback.  Gray,  and 
Pilot  Whale)  and  underwater  earthquake  data. 

Chapter  II  describes  the  methodology  used  in  the  signal  selection.  Chapter  III  presents  the 
anahtical  methods  used  to  compute  the  input  parameters  to  the  Neural  Network  implementation. 

In  this  chapter,  we  first  describe  the  adaptive  line  enhancer  procedure  designed  to  reduce  the  effects 
to  wideband  noise  contained  in  the  recordings.  Next,  we  present  the  reduced-rank  covariance  AR 
modeling  technique.  Finally,  we  briefly  review  mulliresolution  algorithms  and  present  their 
application  to  our  classification  problem.  Chapter  IV  describes  the  back-propagation  neural 
network  implementation  used  during  our  study.  Results  are  presented  in  Chapter  V.  Last,  Chapter 
\T  presents  conclusions  and  recommendations  for  further  study. 
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II.  SIGNAL  SELECTION 

The  operating  environment  of  the  undersea  sonar  array  is  in  the  Pacific  Northwest,  off  the 
coast  of  Whidbey  Island  Washington.  Researchers  familiar  with  the  array  data  have  indicated  the 
presence  of  marine  mammal  sounds  in  the  recordings,  as  this  array  is  located  in  a  whale  migration 
route.  Thus,  this  system  has  a  high  potential  for  use  in  marine  mammal  studies.  In  order  to  separate 
underwater  seismic  data  from  marine  mammal  sounds,  we  selected  recordings  from  a  cross  section  of 
whale  species  that  represent  the  population  in  the  array  system  area  during  the  yearly  migration 
periods.  The  frequency  characteristics  of  the  various  whale  species  cover  the  entire  frequency 
spectrum.  The  biological  signals  range  from  very  short  pulses  to  long  melodic  songs.  In  addition  to 
the  underwater  earthquake  used,  the  types  of  biological  signals  chosen  for  the  study  are:  Sperm 
Whale,  Killer  Whale,  Humpback  Whale,  Gray  Whale,  and  Pilot  Whale. 

Note  that  even  though  the  array  data  indicates  the  presence  of  some  species  of  whale 
sounds,  the  biological  and  earthquake  signals  used  were  obtained  from  a  different  source  [12].  The 
decision  not  to  use  biological  cuts  obtained  from  the  system  array  data  was  motivated  by  the  fact 
that  we  were  unable  to  obtain  proper  identification  of  the  various  species  of  whales  by  a  specialist. 
Thus,  in  order  to  minimize  any  problems  due  to  incorrect  training  we  used  cuts  from  a  commercial 
audio  cassette  in  which  the  whale  species  have  been  properly  identified.  Each  recording  varied  in 
length  from  fifteen  to  thirty  seconds.  The  recordings  were  of  real,  open  ocean  encounters  from 
various  signal  collection  platforms.  The  signals,  as  an  artifact  of  how  they  were  collected,  were  all 
corrupted  with  background  noises.  The  corruption  of  the  signals  includes  sounds  from  ships,  small 
boats,  and  other  disturbances  occurring  in  the  natural  environment,  plus  artificial  noise  from  the 
means  of  collection. 

Each  signal  was  digitized  on  a  486  PC  using  a  Media  Vision  Pro  Audio  Spectrum  sound  card 
at  8  kHz,  using  a  single  channel  and  8  bits  per  sample.  This  ensured  audio  compatibility  with  the  Sun 
workstation’s  audio  playback  which  is  limited  to  single  channel  and  8  bit,  8  kHz  data.  The  digitized 
recordings  were  then  processed  to  extract,  as  close  as  possible,  the  whale  signals  from  the  recordings, 
using  audio,  and  visualizations  of  the  time  and  frequency  domains.  The  whale  ‘songs’  were  then 
selected  and  separated  from  the  various  other  auditory  manifestations  of  whale  sounds.  Next,  these 
sample  songs  were  pasted  together  to  make  a  nearly  continuous  song  for  each  species.  The  sperm 
whale  signal  was  of  a  different  nature.  The  recording  is  of  the  animals’  echo  ranging  sonar.  The  very 
short,  rapid  pulses  were  all  kept  intact.  The  earthquake  recording  was  also  kept  intact.  The  pilot 
whale  sound  produced  a  challenge  as  it  was  very  short  in  duration,  has  high  pitch,  and  was  buried  in 
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noise.  It  was  especially  difficult  to  separate  the  background  noise  from  the  pilot  whale  signal.  Ficure 
2-1  presents  tvpical  time  domain-plots  from  each  class  of  signal  studied. 

Time  Domain  Signals 


1  -5  2  2.5 

Samples  Earthquake 


Figure  2-1  Time  domain  signals  of:  a)  sperm  whale,  b)  killer  whale,  c)  humpback  whale,  d)  gray 

whale,  e)  pilot  whale,  and  f)  earthquake. 

In  this  chapter  we  have  identified  the  signals  that  we  are  to  classify.  Next  in  Chapter  III. 
we  introduce  the  methods  of  feature  extraction  used  in  this  study.  Specifically  we  present  the 
reduced  rank  AR  modeling  technique,  the  discrete  wavelet  transform  producing  wavelet  based 
parameters,  and  the  adaptive  line  enhancer  to  reduce  the  effects  of  wide-hand  noise  contained  in  the 
signals. 
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III.  FEATURE  EXTRACTION  OF  BIOLOGICAL  AND  SEISMIC 

SIGNALS 


A.  INTRODUCTION 

The  biological  and  seismic  signals  used  in  this  study  consist  of  data  records  that  typically 
exceed  40,000  data  points,  which  precludes  any  realistic  method  for  classification  based  on  the 
data  directly.  Therefore  signal  characteristics  need  to  be  expressed  in  ways  that  retain  the  signals’ 
unique  features,  yet  drastically  reduce  the  amount  of  data  needed  as  inputs  to  the  classifier. 

Feature  extraction  methods  are  a  means  of  modeling  a  signal  based  on  some  specific  property  of 
the  signal.  These  features  are  then  used  to  classify  the  signal. 

One  of  the  most  distinguishing  features  of  an  acoustic  signal  is  its  spectral  content.  The 
signals  investigated  in  this  study  are  generated  by  resonating  cavities,  vibrating  vocal  cords,  and  in 
the  case  of  seismic  signals,  the  vibration  of  rock  formations  rubbing  against  each  other.  As  a 
result,  most  of  the  energy  contained  in  the  signals  under  study  is  concentrated  in  a  few  firequency 
components.  Note  that  an  excq)tion  to  this  is  the  sperm  whale  data,  which  contains  a  v^y  wide 
spectral  content.  As  a  result,  spectral  infcamation  can  be  used  to  distingmsh  between  the  different 
classes  of  signals.  Two  different  procedures  are  considered  in  the  study  to  extract  frequency  based 
information.  First  we  apply  a  reduced-rank  AR  modeling  technique  to  compute  AR  coefficients 
which  are  used  as  inputs  to  the  neural  network  classifier.  Next,  we  use  the  discrete  wavelet 
transform  to  compute  wavelet-based  parameters  which  are  used  as  inputs  to  the  NN.  In  addition, 
we  investigate  the  application  of  an  adaptive  line  enhancer  (ALE)  algorithm  as  a  pre-processing 
step  to  reduce  the  effects  due  to  wide-band  noise  contained  in  the  data.  This  chapter  presents  the 
feature  extraction  methods  used  in  this  study.  The  first  part  of  this  chapter  presents  the  ALE  noise 
reduction  procedure.  Next,  we  present  the  reduced-rank  autoregressive  modeling.  The  discrete 
wavelet  transformation  and  its  application  to  our  study  is  presented  last. 
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B.  NOISE  REDUCTION  USLNG  ADAPTIVE  LINE  ENHANCEMENT 

1.  Introduction 

This  study  investigates  the  application  of  an  Adaptive  Line  Enhancement  (ALE)  algorithm 
to  decrease  the  effect  due  to  wideband  noise  present  in  the  signals  under  study.  The  ALE  algorithm 
is  a  gradient  descent,  adaptive  filter  based  on  the  least  mean  square  (LMS)  algorithm.  The 
algorithm  tracks  narrowband  frequencies  contained  in  the  input  signal,  and  removes  wideband 
components  that  are  uncorrelated  to  the  frequencies  present.  First,  we  briefly  review  the  concept  of 
the  LMS  algorithm,  and  next  we  present  its  application  in  the  ALE  algorithm. 

2.  Least  Mean  Square  Algorithm 

The  motivation  for  the  least  mean  square  algorithm  is  to  design  a  filter  that  learns  from  its 
ensironment  and  converges  to  the  optimum  weight  values  given  by  the  Wiener-Hopf  equations  in  a 
stationar}-  environment.  The  LMS  algorithm  is  a  stochastic  gradient-based  algorithm  which  is 
simple  to  implement  and  is  effective  in  its  ability  to  adapt  to  the  external  environment. 

a.  Overview  Of  The  Structure  And  Operation  of  the  LMS  Algorithm 
The  system  building  blocks  for  the  LMS  algorithm  are  depicted  in  Figure  3-1 
below.  A  similar  block  diagram  will  be  shown  later  for  the  ALE  implementation.  The  input  signal 
is  fed  into  an  adaptive  FIR  filter,  and  the  output  of  the  filter,  y(n),  is  compared  to  the  desired 
signal.  d(n).  forming  the  error  e(n).  The  second  important  feature  of  this  scheme  is  the  adaptive 
control  process  where  the  filter  weights  are  updated  based  on  the  direction  needed  to  minimize  the 
mean  square  error.  The  filter  weights  are  updated  in  accordance  with: 

M;(n  +  1)=  w(n)  +  ^/y[-V/(n)],  (3.1) 

where  ntn)  is  the  filter  w'cight  vector,  p  is  the  step  size  parameter,  and  Vj(n)  is  the  gradient  of  the 
mean  square  error  function.  The  negative  gradient  of  the  mean  square  error  function  , 

-Vj(n).  is  approximated  by  the  instantaneous  gradient  expression: 

J {n)  ~  2u{n)  ■  e\n) ,  (3.2) 

where  n(n)=[»(n),  »(/;-!), ...  ,«(/;-P-i-l)]^,  and  P  is  the  length  of  the  filter. 
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y[n] 


Figure  3-1  The  LMS  Algorithm. 


Thus  the  LMS  estimated  weight  vector  is  obtained  using  the  instantaneous  gradient  and  is  given 
by: 

w(w  + 1)  =  +  /je*{ti)u(n) .  (3.3) 


The  LMS  weight  coefficients  are  given  by: 

+  =  +  k  =  0,l,...,P-l  (3.4) 

The  step  size  parameter,  fi,  is  defined  as  a  positive  real  constant  which  controls  the  size  of 
the  incremental  weight  correction,  and  is  bound  by: 


Q<ju< 


1 

P-R„(0)’ 


(3.5) 


to  insure  that  the  algorithm  converges.  Note  that  the  LMS  weights  exhibit  a  random  motion 
aroimd  the  optimal  solution  due  to  of  the  instantaneous  approximation  of  the  gradient,  also  known 
as  gradient  noise.[3,  p.  300] 

The  LMS  algorithm  is  implemented  as  follows[3,  p.  332]: 

1)  At  time  n=0,  initialize  the  weight  vector  to  an  estimate  of  w(0) 

2)  y(n)=  H'"(n)u(n) 
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3)  e(n)  =  d(n)-y(n) 

4)  vy(n+l)  =  w  (n)  +4u(n)c*(n) 

%\  here  the  parameters  of  the  algorithm  to  be  chosen  by  the  user  are  the  length  of  the  filter.  P,  and 
the  step  size  parameter,  p. 

3.  .Adaptive  Line  Enhancement  Using  the  LMS  Algorithm 

The  adaptive  line  enhancer  using  the  LMS  algorithm  is  a  logical  choice  to  reduce  the 
wideband  noise  inherent  in  the  sampled  recordings.  The  motivation  behind  using  the  LMS 
algorithm  is  that  it  is  simple  to  implement,  and  has  been  historically  proven  to  achieve  satisfactory 
performance,  \\Tien  used  in  the  right  conditions,  the  results  can  be  of  high  quality. 

The  signals  under  investigation  vary  widely  in  nature  from  very  localized  in  time  (broad 
frequency  range),  in  the  case  of  the  sperm  whale,  to  very  localized  in  frequency  (slowly  changing  in 
time),  in  case  of  the  humpback  w'hale.  Thus  the  step  size  was  chosen  to  be  0.05  percent  of  the  total 
signal  power  as  a  result  of  a  trial  and  error  process. 

The  number  of  filter  weights  was  chosen  to  decrease  the  total  number  of  frequency  related 
components  contained  in  the  signals,  and  yet  preserve  enough  to  be  able  to  discriminate  between 
each  class  of  signal.  This  study  uses  a  filter  length  of  10  to  extract  the  five  most  dominant 
sinusoids  in  each  signal  class.  The  figure  below  represents  the  implementation  of  the  ALE 
algorithm  to  estimate  the  signals  present  in  the  biologic  and  seismic  data. 


Input  Signal  u[n] 


Figure  3-2  The  Adaptive  Line  Enhancer. 


The  delay  A,  shifts  the  signal  in  time  causing  the  broadband  noi.se  contained  in  u(n-A)  to 
become  uncorrelatcd  with  the  broadband  noi.se  contained  in  the  reference  signal  d(n)  =  u(n).  By 
experimentation  wc  found  that  a  delay  of  one  sample  was  sufficient  to  decrease  the  contribution  of 
broadband  noise  without  distorting  the  .sperm  whale  data. 
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The  output  of  the  filter  reflects  only  the  signal  that  is  correlated  to  the  desired  signal, 
provided  the  filter  is  operating  optimally.  The  MATLAB®  program  used  in  this  study 
LMSALE.M  was  written  by  LT  D.  Brown,  USN  [14]  and  is  included  in  the  appendix.  The 
following  spectrograms  represent  the  signals  before  and  after  applying  the  ALE  algorithm.  Note 
that  the  results  reflect  the  trade  off  in  step  size,  filter  length,  and  delay  for  all  classes.  This  process 
as  implemented  greatly  improved  the  gray  whale  data,  and  had  varying  degrees  of  success  for  the 
other  classes. 

Each  of  the  following  spectrograms  was  obtained  using  the  MathWorks  function 
SPECGRAM.M.  A  Hanning  window  of  size  512  with  50  %  overlap  is  used  and  the  FFT  length 
chosen  is  5 1 2.  Note  that  the  spectrograms  are  given  in  terms  of  normalized  frequency,  where  half 
sampling  frequency  is  rqjresented  by  0.5  and  the  time  axis  is  expressed  in  terms  of  number  of 
samples.  The  spectral  information  is  mapped  to  color  values.  The  range  of  color  values  is  red, 
orange,  yellow,  green  and  blue.  High  intensity  spectra  appear  as  red.  Low  intensity  spectra  appear 
as  blue. 
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Spectrogram  of  Gray  Whale  Data 


Figure  3-3  Spectrogram  of  gray  whale  data;  frequency  (fs/2  =  0.5),  normalized  time  expressed  in 

number  of  samples. 


Spectrogram  of  ALE  Gray  Whale  Data 


°0  0.5  1  1-5  2 


Time  x  1 


Figure  3-4  Spectrogram  of  gray  whale  data  after  processing  with  ALE;  normalized  frequency 
(fs/2  =  0.5),  normalized  timeexpressed  in  normalized  number  of  samples. 


Frequency 


Figure  3-12  Spectrogram  of  killer  whale  data  after  processing  with  ALE;  normalized  frequency 


(fs/2  =  0.5),  normalized  time  expressed  in  number  of  samples. 
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C.  REDUCED-RANK  AUTOREGRESSIVE  MODELING 


1.  Introduction 

The  various  classes  of  signals  under  investigation  in  this  study  exhibit  differences  in  their 
spectra,  therefore  spectral  information  is  used  for  classification  purposes.  The  autoregressive  (AR) 
coefficients  of  the  filter  used  to  model  the  original  signal  are  the  characterizing  parameters  chosen 
as  input  parameters  because  they  rqjresent  the  spectra  of  the  signals.  They  are  used  to  classify  the 
various  classes  of  signals  under  study. 

In  this  section,  we  first  introduce  the  concept  of  autoregressive  modeling  and  the 
covariance  method  used  in  this  study.  Next,  we  consider  the  problem  of  selecting  the  model  order. 
Finally,  we  present  the  application  of  reduced-rank  modeling  to  the  covariance  method  used  to 
decrease  the  effect  of  noise  in  the  data. 

2.  Autor^ressive  Modeling 

Autoregressive  (AR)  modeling  is  based  on  the  idea  that  an  original  signal  x(n)  can  be 
expressed  as  the  output  of  an  all-pole  linear  shift  invariant  predictive  filter  driven  by  white  noise. 

In  the  time  domain,  this  means  that  the  signal  x(n)  may  be  expressed  as  a  linear  combination  of 
previous  values  x(n-i),  i  =  1,2, ...  P,  and  some  input  noise  sequence  w(n). 


p 

x(n)  =  -  ^ a{k)x(n  -k)  +  b^win) , 


*=i 


(3.6) 


where  P  is  the  order  of  the  predictor,  bo  is  the  gain,  and  (a(l), ... ,  a(P))  are  the  coefficients  of  the 
linear  predictor  to  be  determined.  Taking  the  Z-transform  of  (3.6),  the  resulting  transfer  function 
of  the  system  used  to  generate  x(n)  ft^om  w(n)  is  given  by: 


H{z)  = 


X{z)  ^  b, 

W(z)  \  +  a^z'^+...+aj,z' 


A(Z) 


(3.7) 


The  AR  coefficients  can  be  obtained  by  solving  a  set  of  linear  equations  obtained  from  Equation 
(3.6).  Using  the  properties  of  the  AR  model,  the  correlation  function  RJ^l)  obtained  from  x(n)  is 
given  by; 

R,  (/)  =  -a, Q  -  D- .  ~a,R,  (l-P)  +  b,R^  (/) , 

which  leads  to; 


(0  +  (^  !)■*  ^^p^x  0  —  b^  R^^  (/) .  (3.8) 
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The  cross  correlation  term  Rmc(0  can  be  expressed  as  the  convolution  of  the  impulse  response  h(n) 
of  the  AR  system  with  the  autocorrelation  of  the  noise  sequence  Rw(n)  as; 


where 


which  leads  to 


RAi)=<ylm 

R^Q)  =  Kl)-al5{l)  =  alKl). 


R^(l)  =  clh\-l).  (; 

By  substituting  (3.9)  into  (3.8),  the  correlation  difference  equation  becomes: 

R,il)+a,R,{l-\)+--+apRS-P)  =  b,alh\-l).  (3 

Note  that  h(n)  is  the  impulse  response  of  a  causal  filter,  therefore  h(n)  is  equal  to  0  for  n  <  0. 
Next,  using  the  Initial  Value  Theorem  we  have; 


/i(0)  =  lim  H(z)=  lim; - - y  =  b,. 

z-^o.\  +  a^z->r---+apZ 


Therefore, 


Thus,  Equation  (3.10)  becomes; 


forl=0 


KJI)  =  0 


for  /  >  0. 


(/)  +  ajR^  (/  -  !)+•  •  •+apR^  (l-P)=  b^alSil),  for  I  >  0.  (3.12) 

Expressing  Equations  (3.12)  for  1  =  0,...,  P,  leads  to  the  extended  set  of  Yule- Walker  equations 
[l,p.414] 


-RM  ^.(-1) 
^.(1)  ^.(0) 


R.{-P)  1  1  al\b,\^ 

R,(-P  +  l)  0 


R^{P)  R,{P-\) 


RM  a. 


The  set  of  AR  coefficients  of  the  filter  a  =  [1,  ai...,ap]^  can  be  obtained  by  solving  the  set 
of  linear  equations  derived  from  Equation  (3.13); 
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(3.14) 


^.(0)  ^,(-1)  -  RA-p+\y 

/?,(!)  /?,(0)  •••  R^(-p  +  2) 

“  — 

RA+2) 

_RAP-l)  RAP-2)  •••  RAO) 

_RA+P)_ 

R, 

a 

■■  d. 

Note  that  in  practical  situations  the  true  correlation  matrix  is  usually  not  knowii  and  has  to  be 
estimated  from  the  observed  data.  Various  estimation  procedures  have  been  considered  in  [  1  ].  In 
this  study  the  covariance  method  is  used  to  estimate  the  correlation  matrix  because  no  assumptions 
are  made  about  the  data  beyond  the  length  of  the  prediction  filter.  Note  that  this  is  different  from 
the  autocorrelation  method  where  the  data  is  zero  padded. 

3.  The  Covariance  Method 

In  modeling  real  signals  where  the  true  first  and  second  order  statistics  are  not  known,  the 
correlation  structure  has  to  be  CvStimated  from  the  observed  data,  and  used  to  solve  for  the  AR 

coefficients  and  the  gain  term  bo  The  covariance  method  uses  the  following  equation  to  estimate 
correlation  lags: 

“  77^  p  X -  l)x(n) .  (3.15) 

l\  —  Jr  n=p 

The  gain  term  h'o  k  can  then  be  estimated  from  the  estimated  a  coefficients  by: 

2  ^ 

l^o|  where  £1;,=/.  (3.16) 

t=o 

The  estimated  correlation  matrix  resulUng  from  Equation  (3.15)  is  hermitian  (R,(k)  =  R/(-k)). 
Note  that  it  is  singular  if  the  data  consists  of  P-1  or  fewer  complex  sinusoids.  However  any  noise 
in  thv.  obser\ed  data  will  cause  the  matrix  to  become  non-singular.  A  noted  drawback  to  the 
covariance  method  is  that  the  resulting  e.stimated  pole  locations  are  not  guaranteed  to  be  inside  the 
unit  circle.  [2,  p.  223]  Hence  the  AR  filter  is  not  guaranteed  to  be  stable.  However  this  noted 
drawback  docs  not  prevent  using  the  AR  parameters  to  characterize  the  signal  properties.  In 
addition,  note  that  the  strength  of  the  covariance  method  over  the  autocorrelation  method  is  that  if 
the  signal  to  be  modeled  is  of  pure  sinusoids,  the  covariance  method  can  be  used  to  perfectly  model 
the  frequencies.  This  property  is  not  shared  by  the  autocorrelation  method. [2  p.  223] 

The  main  frequencies,  zt:.  obtained  by  the  AR  modeling  procedure  can  be  estimated  as  the 
root.s  of  the  polynomial: 
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A(z)  =  l  +  ajZ  '+---+apZ 


(3.17) 


An  estimate  of  the  spectrum  can  be  obtained  from  the  recursive  AR  model; 

x{n)  +  OyXin  -  1)h — \-apX{n  -P)  =  e(n)  (3.18) 


where  e(«)  is  the  modeling  error.  Ideally  e(n)  is  a  Gaussian  white  noise  sequence  with  a  gain  equal 
to  l&ol^<Tw^,  which  leads  to  the  frequency  spectrum: 


IA(e^'^)P  ’ 


0<<l)<2n. 


(3.19) 


4.  Model  Order  Selection 

Selecting  the  order  of  an  AR  model  is  a  difficult  task.  The  best  choice  is  usually  not 
known,  and  trial  and  error  methods  are  sometimes  used.  If  the  data  is  truly  described  by  a  finite 
order  AR  model,  then  theoretically  the  variance  will  become  constant  once  the  model  order  is 
reached.  In  practice  this  is  not  usually  true  for  a  variety  of  reasons.  The  estimate  may  not 
converge,  or  if  it  does,  it  may  be  difficult  to  judge  exactly  when  this  occurs.  A  number  of  criteria 
have  been  developed:  the  four  most  well  known  are  Akaike’s  Information-theoretic  (AIC),  Parzen’s 
criterion  Autoregressive  transfer  (CAT),  Akaike’s  final  prediction  error  (FPE)  and  Schwartz  and 
Rissanen’s  minimum  description  length  (MDL).[1 ,  p.  549]  In  this  study  the  initial  model  order 
was  chosen  by  estimating  the  model  order  using  AIC,  MDL,  CAT  and  FPE  criteria  on  the  sperm 
whale  signal.  These  criteria  were  implemented  earlier  in  the  program  ORDER.M  written  by  LT 
Ken  Frack,  USN.  [15]  The  sperm  whale  signal  was  used  to  set  the  model  order  because  it  is  highly 
localized  in  time  and  has  the  broadest  bandwidth  of  all  various  signal  classes  considered  in  this 
study.  Thus,  a  larger  model  order  may  be  needed  to  accurately  describe  it.  The  other  signals  are 
more  localized  in  frequency  and  typically  need  a  lower  model  order  unless  the  signal  has  a  low 
signal  to  noise  ratio.  The  results  of  running  ORDER.  M  are  shown  below  with  the  noted  criteria. 
For  this  study  the  model  order  was  chosen  to  be  25  based  on  the  mean  of  the  four  minimum  model 
orders. 
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5.  Reduced-Rank  Covariance  Method 

When  a  sinusoidal  signal  is  embedded  in  noise,  AR  modeling  methods  generally  paform 
poorly  for  pole  position  estimates.[2,  p.  225]  The  mechanism  used  in  this  study  to  improve  the 
covariance  method  is  to  reduce  the  rank  of  the  estimated  correlation  matrix  with  the  singular  value 
decomposition  (SVD).  The  theory  behind  this  method  is  to  separate  the  contribution  due  to  noise 
from  that  of  the  signal-plus-noise  by  applying  an  SVD  to  the  signal  correlation  matrix.  A  review 
of  the  SVD  is  introduced  next,  followed  by  examples  of  the  reduced  rank  covariance  method  used 
in  this  study. 

a.  The  Singular  Value  Decomposition 

The  singular  value  decomposition  theorem  states  that  any  M  x  N  matrix  where 
M  can  be  decomposed  into  the  product  of  three  matrices. 

R  =  imv^,  (3.20) 

where  the  matrix  U  is  the  M  x  M  unitary  matrix  containing  the  so  called  left  singular  vectors  of  R, 


'■M 


(3.21) 


the  matrix  V  is  an  N  x  N  unitary  matrix  containing  the  right  singular  vectors  of  R, 

I  I 


V  = 


^2 
I  I 


(3.22) 


and  the  matrix  Z  is  an  M  x  N  diagonal  matrix  containing  the  singular  values  of  R, 


CTi  0  0  0 

0  (Tj  0  0 


z  = 


0  0 
0  0 


^  N  ’ 
0 


0 


(3.23) 


where  cti  >  Oi  >  ...  >  On.LI,  p.54-55]  The  SVD  allows  the  user  to  decompose  the  signal  into  its 
principal  components.  The  assumption  is  that  the  singular  values  associated  with  the  signal  will  be 
larger  than  the  singular  values  associated  with  the  underlying  noise.  As  a  result  the  singular  values 
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associated  with  the  signal  and  noise  components  are  separated  from  the  singular  values  associated 
with  the  noise  by  a  gap.  With  this  decomposition  one  can  identify  the  signal  components  and  invert 
only  the  portion  of  the  SVD  decomposition  pertaining  to  the  signal  only. 

b.  Rank  Reduction 

Applying  the  SVD  to  the  covariance  method  leads  to  solving  for  the  a  coefficients 
in  the  following  manner  from  Equation  (3.14); 

a  =  -R*d, 

''■here  ^V(;,1:A')'', 

(3.24) 

R  is  the  pseudo  inverse  of  R  of  rank  k,  where  k  is  the  number  of  large  singular  values  contained  in 
X.  The  rank  reduction  can  be  viewed  as  a  truncated  SVD  that  sets  smaller  singular  values 
associated  with  the  noise  to  zero,  thereby  canceling  ill-conditioning  problems  that  would  occur 
when  inverting  an  almost  singular  matrix.  The  selection  of  the  singular  values  to  keep  is  done 
visually  by  detecting  a  gap  in  the  plot  of  the  singular  values.  The  method  used  in  this  work  was 
done  b>'  \isual  inspection  followed  by  a  comparison  of  the  AR  spectrum  with  the  Fourier  spectrum 
of  the  signal  segment.  We  used  this  method  because  the  underlying  noise  is  not  constant,  and  the 
signal  to  noise  ratio  varies  with  each  signal  segment  analyzed.  This  comparison  allowed  us  to  see 
how  well  the  model  tracts  the  dominant  frequencies  of  the  signal.  Not  being  coanizant  of  the 
resulting  AR  spectrum  could  lead  to  cutting  out  some  of  the  signal  contributions.  When  the  data 
consists  of  a  signal  with  a  high  signal  to  noise  ratio,  the  rank  of  the  AR  covariance  matrix  can 
easily  be  reduced  b}'  detecting  the  gap  between  the  singular  values  of  the  signal  and  noise  and  those 
of  just  the  noise.  The  underlying  assumptions  are  that  the  signal  is  much  stronger  than  the  noise, 
and  is  clo,se  to  being  stationary  in  the  segment  interval.  As  a  result,  a  signal  fitUng  this  criterion 
will  have  singular  values  that  are  much  more  prominent  than  the  singular  values  associated  with 
the  noise.  The  singular  values  associated  with  the  noise  will  be  almost  constant  and  small  in 
magnitude.  The  following  figures  are  typical  examples  of  singular  value  plots  of  the  six  categories 
of  signals  used  in  this  study.  Also  showm  are  plots  of  the  AR  spectrum  plotted  over  that  of  the 
Fourier  spectrum  of  the  typical  signal  .segments. 
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Singular  Values  of  Segment 


Frequency  Response  of  Model  Segment  24  and  Segment  Spectral  Content 


Sampling  Frequency 

Figure  3-17  Pilot  whale  data  a)  singular  values  of  AR  covariance  matrix  of  order  25,  b)  Typical  AR 
and  frequency  spectra  of  data  segment  of  length  512. 


Singular  Values  of  Segment 


Frequency  Response  of  Model  Segment  5  and  Segment  Spectral  Content 


Sampling  Frequency 

Figure  3-18  Earthquake  data  a)  singular  values  of  AR  covariance  matrix  of  order  25,  b)  Typical  AR 
and  frequency  spectra  of  data  segment  of  length  512. 


23 


Figure  3-19  Gray  whale  data  a)  singular  values  of  AR  covariance  matrix  of  order  25,  b)  Typical  AR 
and  frequency  spectra  of  data  segment  of  length  512. 


Sampling  Frequency 

Figure  3-20  Humpback  whale  data  a)  singular  values  of  AR  covariance  matrix  of  order  25,  b) 
Typical  AR  and  frequency  spectra  of  data  segment  of  length  512. 
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Singular  Values  of  Segment 
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Sampling  Frequency 

Figure  3-21  Killer  whale  data  a)  sii^ular  values  of  AR  covariance  matrix  of  order  25,  b)  Typical  AR 
and  frequency  spectra  of  data  segment  of  length  512* 


Singular  Values  of  Segment 


Sampling  Frequency 


Figure  3-22  Sperm  whale  data  a)  singular  values  of  AR  covariance  matrix  of  order  25,  b)  Typical 
AR  and  frequency  spectra  of  data  segment  of  length  512. 


Tabic  3. 1  depicts  the  t>pical  rank  chosen  when  modeling  the  signals  in  this  study.  Note 
that  the  sperm  whale  requires  the  highest  rank,  which  is  due  to  the  wide  frequency  bandwidth  of  the 
signal.  The  gray  whale  also  requires  a  relatively  high  reduced  rank  due  to  the  harmonic  qualities  of 
the  signal.  Finally  note  that  earthquake  and  humpback  whale  signals  require  a  reduced  rank  equal 
to  2. 


Signals 

Average  number  of  singular  values  retained 

Pilot  whale 

12 

Earthquake 

2 

Killer  whale 

10 

Sperm  whale 

17 

Gray  whale 

15 

Humpback  whale 

2 

Table  3.1  Typical  number  of  singular  values  selected  for  retention  for  each  class  of  signal. 


D.  THE  DISCRETE  W.WTELET  TRANSFORMATION 

1.  Introduction 

VVa\  elet  theory  provides  a  unified  framework  for  a  number  of  techniques  that  are  applied 
in  signal  processing.  These  techniques  include  multiresolution  signal  processing,  used  in  computer 
vision,  subband  coding,  developed  for  speech  and  image  compression,  and  wavelet  series 
expansions,  developed  in  applied  mathematics.  The  Discrete  Wavelet  Transform  (DWT)  is  used 
on  this  study  as  a  classification  tool  to  take  advantage  of  the  filters’  constant-Q  spectral 
characteristics  which  may  match  the  spectral  characteristics  of  the  biological  sicnals  under  study 
well.  The  signals  analyzed  in  this  study  are  non-stationary  in  nature.  They  vary  in  frequency 
content,  signal  length,  magnitude  and  background  noi.se.  Analysis  by  classical  means  is  therefore 
difficult.  The  DWT  pro\'idcs  an  alternate  to  the  Discrete  Short-Time  Fourier  Transform, 

(DSl  F  l  y  in  that  unlike  the  DSTFT,  the  DWT  uses  an  analysis  window  that  can  be  dilated  and 
contracted  to  give  different  resolutions  in  frequency  and  time  on  the  time-scale  plane.  The  DWT 
allows  for  the  localization  in  time  of  high  frequency,  fast  changing  signals  and  allows  for  the 
localization  in  frequency  of  low  frequency,  slow  changing  portions  of  the  signal. 

Analogous  to  the  time-fraiuency  plane  in  the  DFT,  the  signal  is  mapped  onto  the  time- 
scale  plane.  Scale  is  inversely  proportional  to  frequency  and  denotes  the  level  of  contraction  or 
dilation  of  the  basis  filter.  Scale  and  level  are  used  interchangeably  and  are  synonymous. 
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This  section  presents  a  review  of  the  DWT  starting  from  its  relationship  to  the  Fourier 
transform  and  the  Short  time  Fourier  transform.  Next  we  present  two  discrete  implementations  of 
the  wavelet  transform;  the  Mallat’s  algorithm  and  the  A-Trous  algorithm.  Finally,  we  present  how 
each  of  these  algorithms  are  applied  in  the  study  to  derive  the  parameters  of  average  energy  per 
scale  for  the  Mallat’s  algorithm  and  average  energy  per  voice  per  scale  for  the  A-Trous  Algorithm. 

2.  The  Continuous  Wavelet  Transform  And  Series 

We  first  present  the  similarities  between  the  WT  and  the  continuous  Fourier  transform 
and  the  Short-Time  Fourier  Transform  to  tie  wavelets  to  a  classical  signal  processing  tool.  The 
continuous-time  Fourier  Transform  is  an  important  tool  in  the  analysis  of  stationary  signals  as  it 
describes  the  signal  using  a  basis  of  complex  exponentials.  The  Fourier  Transform  of  a  signal  is 
given  by: 


F{(o)=]f(t)e-^^dt.  (3.25) 

The  Fourier  transform  has  many  drawbacks  when  applied  to  non-stationary  data,  as  all  non¬ 
stationary  information  will  be  ‘averaged  out’.  The  Short  -Time  Fourier  Transform  (STFT) 
overcomes  some  of  these  drawbacks  by  sliding  a  window,  w(t)  over  the  data  to  be  analyzed.  The 
Fourier  transform  of  each  successive  segment  of  data  is  then  computed  to  extract  the  Fourier 
transform  information  over  each  segment.  The  STFT  is  defined  by: 

STFTj^  ((0,t)  =  J  f{t)yv(t  -  (3.26) 

where  the  finite  windowing  function,  w(t-r),  is  centered  around  time  t.  The  localized  signal  is 
transformed  giving  the  frequency  representation  at  that  time.  The  window  is  then  shifted  along  the 
time  axis  and  the  procedure  is  repeated.  The  resulting  transform  is  time  dependent,  and  forms  a 
time  frequency  representation  of  the  signal. 

A  noted  drawback  to  the  STFT  is  the  fixed  window  length  of  this  transform.  Thus,  the 
transform  can  not  adapt  to  the  changing  characteristics  of  a  signal  at  a  certain  point  in  time.  To 
address  this  drawback,  another  type  of  transform  was  developed,  the  continuous  time  wavelet 
transform  (CTWT).  The  CTWT  is  formed  by  taking  the  inner  product  of  a  signal  with  basis 
functions  that  can  be  both  dilated  or  contracted,  and  shifted  in  time.  The  CTWT  is  defined  as 

CTWT}  (a,b)  =  -^]f  .  (3.27) 
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The  basis  functions,  or  wavelets,  defined  as: 


a  ) 


(3.28) 


arc  oscillator}  in  nature.  They  taper  to  zero  at  infinity  and  negative  infinity  or  are  zero  outside  of 
the  support  interval.  The  argument  a  is  called  the  scale  parameter,  and  the  argument  b  is  the  time 
localization  variable.  By  changing  the  scale  parameter,  the  basis  function  either  contracts,  for  a 
small  a,  or  dilates  for  a  large  a.  Note  that  the  scale  parameter  a  is  inversely  proportional  to  the 
frequency:  a  small  a  denotes  a  high  jffequency,  while  a  large  a  denotes  a  low'  frequency.  Thus  the 
transform  can  adjust  to  the  changing  nature  of  a  signal  by  the  dilation  and  contraction  of  the 
w  a\  elet  function.  An  easy  way  to  visualize  this  difference  is  to  draw  the  time-frequency  tiline  of 
both  the  STFT  and  the  CTWT  side  by  side. 
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Figure  3-23  Tiling  of  the  time-frequency  plane,  (a)  Short-time  Fourier  transform;  (b)  Continuous 

time  wavelet  transform. 

As  pictured,  the  STFT  (left)  has  a  uniform  window'  length  w'hich  creates  a  rectangular  zrid 
o\er  the  time-frequency  plane.  The  uniform  window'  length  provides  the  same  localization  in  time 
for  all  frequency  components  of  the  signal.  The  CTWT  (right)  provides  a  longer  window  for  low 
frequency  signals  that  do  not  change  abruptly  in  time,  and  a  shorter  window  for  high  frequency 
signal  that  change  rapidly  with  time.  Thus  the  CTWT  adjusts  to  the  changing  nature  of  the  sienal. 
Note  that  both  the  STFT  and  the  CTFT  must  satisfy  the  property  that  each  time-frequency  tile,  or 
time-scale  tile,  have  uniform  area. 
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In  addition,  the  function  v<t)  has  to  satisfy  the  following  condition  in  order  to  be  able  to 
reconstruct  the  signal  /(t)  from  its  CTWT: 


l^(6))P 
I  ml 


dco  <  °° , 


(3.29) 


where  'F(co)  is  the  Fourier  transform  of  v<t).  This  ensures  the  transform  is  a  bounded  invertible 
operator  in  the  appropriate  spaces  [6  p.  2645],  If  \(r(t)  tapas  to  zero  at  infinity  and  oscillates,  then 
it  must  have  zero  mean. 


/  =  0. 


(3.30) 


The  inverse  transform  or  reconstruction  of  the  signal  is  defined  as; 


W  jL n  -da  \  a  )  a 


(3.31) 


The  factor  1/a^  in  the  integration  is  the  Jacobian.  The  signal  /(t)  can  be  described  as  the 
summation  of  the  basis  functions  \|/a.b(t)  and  the  coefficients  CTWTf<a,b).  The  constant  Cy 
depends  only  on  the  basis  function  \|/a,b(t).  The  measure  in  this  integral,  dadb,  is  formally 
equivalent  to  integrating  over  time  and  frequency  or  dtdf  [5  p.  14],  We  assume  that  both  the 
wavelets  and  the  signal  are  real- valued  or  complex-analytic  so  only  positive  dilations  need  be 
considered. 

3.  Discrete  Wavelet  Transform 

We  must  consider  the  discrete  version  of  the  CTWT,  as  our  study  deals  with  sampled, 
discrete  signals.  Thus,  in  the  following  we  consider  discrete  values  for  a,  a=2'  where  i  is  termed 
the  octave  of  the  transform.  The  parameter  b  relates  to  discrete  time,  which  leads  to: 


If  we  now  take  I?  to  be  a  multiple  of  u  or  ^  =  2'n  we  have: 

CTWT^  {2'  ,Tn)  =  ^  |  ~ 

The  next  step  is  to  discretize  the  integral  by  replacing  it  with  a  summation  to  obtain  the  wavelet 
series  expansion: 
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(3.34) 


The  sample  rate  has  been  chosen  to  equal  one.  Equation  (3.34)  is  called  the  decimated  wavelet 
transform,  as  indicated  by  the  2'n  on  the  left  hand  side.  The  transform  is  only  computed  every  2' 
samples  at  octave  i. 

To  further  discretize  the  wavelet  function  we  need  to  breakdown  the  wavelets  into  two 
filters,  the  analysis  filter  and  the  scaling  filter.  Through  the  appropriate  use  of  the  discretized 
filters  representing  the  sampled  wavelets,  we  can  build  different  algorithms  that  accomplish 
wavelet  analysis.  Let  g  be  the  discrete  highpass  analysis  filter  obtained  from  sampling  the 
truncated  wavelet  function: 

Sr.  =  r(«)  • 

Proceeding  from  Equation(3.34),  we  can  arrive  at  two  different  algorithms  that  represent 
applicaUons  of  the  discrete  wavelet  transform  (DWT).  [6  p.  2465]  The  difference  in  the  resulting 
algorithms  comes  from  the  definition  of  the  relationship  between  the  scaling  filter,  h,  and  the 
anahsis  filter,  g.  The  scaling  filter  is  a  low  pass  filter  that  yields  the  next  scale  that  the  signal  will 
be  anal\-zed  at.  The  scaling  filter  operates  on  the  signal  spectrum  from  0  to/5/4,  where fs  is  the 
sampling  frequency.  The  analysis  filter  is  a  high  pass  filter  that  defines  the  coefficients  to  be 
anah-zed.  The  analysis  filter  thus  operates  on  the  spectrum  from /5/4  to  fsfl.  The  first  algorithm  is 
Mallat  s  multircsolution  algorithm  which  is  an  orthogonal  wavelet  transform.  The  second 
algorithm  is  the  A-Trous  algorithm,  which  is  a  non-orthogonal  wavelet  transform. 

4.  Multiresolution  Algorithm 

This  section  describes  Mallat’s  multiresolution  algorithm  w'hich  is  illustrated  in  Figure  3- 
24  below,  where  the  ‘‘il  indicates  decimation  by  2.  The  wavelet  coefficients  tf  are  obtained  as 
the  outputs  of  the  combination  of  a  loupass  filter  g(z)  followed  by  decimation  by  2,  and  a  high- 
pass  filter  hfz). 


Figure  3-24  Mallat’s  multiresolution  algorithm. 
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In  the  Multiresolution  algorithm  the  following  signals  are  formed: 


5'^'  =Aih*s‘) 
=Aig*s‘), 


(3.35) 


where  A  is  the  decimation  operation.  The  filter  his  a  lowpass  scaling  filter.  The  filter  g  is  the 
high  pass  discretized  wavelet  function.  This  algorithm  leads  to  an  orthogonal  wavelet  transform 
when  the  wavelet  filters  satisfy  specific  requirements.  In  addition,  the  scaling  filter / and  the 
analysis  filter  g,  satisfy  the  following  properties  [5  p28,  6  p2968]: 

h{L-l-n)  =  (-irgin),  (3.36) 


and  [6  p.  2468]: 

^[h2J-mh2j_„  +  S2j-mS2j-n]~ 
J 

'Lh2n.jg2m-j  = 

j 

ILgn  =  0. 

n 

Zh„=yf2. 


(3.37) 


Several  families  of  wavelets  have  been  found  to  satisfy  the  above  requirements.  In  this 
study  we  used  two  orthogonal  basis  sets;  Coiflet  3  and  Symmlet  8.  These  basis  sets  were  included 
in  the  software  toolbox  for  MATLAB®  “Wavelab  .55”  produced  by  D.L.  Donoho  et.  al.  of 
Stanford  University  [7,8].  Each  of  these  wavelets  were  chosen  for  their  high  degree  of  regularity, 
where  regularity  implies  that:  \|/(n)  =  g”(n)  =  g(-n).  Therefore,  the  DWT  acting  on  the  sampled 
signal  is  exactly  the  sampled  output  of  the  continuous  wavelet  transform  [6  p.  2466].  Figure  3-25 
presents  the  spectral  partitioning  obtained  for  the  DWT  with  Symmlet  8  wavelet  coeflBcients.  The 
high  degree  of  regularity  is  shown  by  the  small  amount  of  leakage  to  the  adjacent  frequency  bands 
covered  by  higher  scales. 
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Figure  3-25  Frequency  resolution  of  niter  bank  using  Symmlet  8  wavelet  coefficients. 

a.  Application  of  the  Multiresolution  Algorithm  to  signal  classification 
The  wavelet-based  feature  coefficients  selected  for  classificaUon  when  using  the 
orthonormal  wavelet  decomposition  are  made  up  of  two  sets  of  coefficients:  1)  the  average  energy 
contained  in  wavelet-based  quantities  obtained  from  scales  2  to  8,  and  2)  the  average  energy 

contained  in  the  low-pass  signal  coefficients  obtained  at  the  same  scales.  Scale  1  is  not  decimated 
and  is  not  used  in  this  application. 

The  a\erage  energy  quantity  E;  obtained  at  scale  /  is  obtained  from: 

I  256-2'  ^ 

where  c..,  represents  the  wa\'clet  coefficient  obtained  at  time  lag  2*^  and  at  scale  /.  The  a\  eracc 
energ\  of  the  low  pass  coefficients  are  found  using  the  same  equation.  The  wavelet  coefficients  c,  k 
and  the  loupass  coefficients  are  derived  using  the  program  Ecoeff  M,  which  calls  function 
F\\T_PO.M  [8].  Thus  a  seven  scale  decomposition  of  a  signal  segment  of  512  points  leads  to  14 
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S.  The  A-Trous  Algorithm 

The  A-Trous  algorithm  differs  from  the  Mallat  algorithm  only  by  the  decimation  of  the 
high  pass  filter  output.  Figure  3-26  below  represents  the  basic  algorithm,  where  the  4'2  represents 
decimation  by  2.  The  output  of  the  transformation  W ,  is  the  decimated  discrete  wavelet  transform 
of  the  original  signal. 


s^— »|f~j-»p|-» 

I  ^ 
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Figure  3-26  A  Trous  wavelet  filter  bank  structure. 

Further  insight  may  be  gained  when  the  algorithm  is  viewed  through  the  frequency  domain 
as  follows: 

A)  Bandpass  the  upper  half  of  the  spectrum,  using  analysis  filter  g,  to  yield  w". 

B)  Lowpass  filter  to  obtain  the  lower  half  of  the  spectrum  [0,7c/2]  using  scaling  filter  f, 
where  the  sampling  frequency  is  equal  to  In. 

C)  Decimate  to  expand  the  lower  half  to  [0,  tt]. 

D)  Go  to  A). 

The  algorithm  is  straightforward.  The  analysis  filter  output  w',  is  obtained  by  using  g  and 
represents  the  high  frequency  information  of  the  signal  s'.  We  then  low  pass  filter  the  remaining 
signal  using  the  scaling  filter  £  By  doing  this,  the  low  frequency  portion  of  the  signal  that  has  not 
been  examined  is  retained  and  is  not  aliased  by  the  upper  frequency  band  of  the  signal  in  the 
dilation  that  follows.  Decimating  the  signal  in  time  dilates  the  signal  in  the  frequency  domain. 

Thus  the  low  frequency  signal  energy  is  now  spread  throughout  the  entire  spectrum.  The  result  is 
octave  z+1. 

Potential  problems  exist  if  we  choose  the  high  pass  filter  g  to  have  a  bandwidth  to  be  less 
than  ir/2.  This  would  cause  part  of  the  signal  not  to  be  examined,  and  thus  be  lost.  If  we  replace 
the  single  analysis  filter  ^  by  a  bank  of  filters  of  the  type  g  to  cover  the  entire  upper  half  of  the 
spectrum,  we  are  introducing  voices.  Two  things  are  accomplished  with  voices:  1)  we  ensure  we 
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ha\  e  sufficient  bandwidth  to  cover  the  upper  spectrum  and  2)  we  add  the  benefit  of  increased 
resolution  in  the  high  frequency  band  of  the  signal.  Thus  the  voices  are  constructed  of  banks  of 
bandpass  modified  Gaussian  filters.  Any  number  of  voices  can  be  used  to  increase  resolution. 

The  difficulty  in  implementing  Equation  (3.34)  to  obtain  the  wavelet  coefficients  is  that,  as 
the  octave,  i.  increases  the  continuous  wavelet,  v}/(t),  has  to  be  sampled  at  more  and  more  points 
creating  a  large  computational  burden.  The  solution  to  this  computational  burden  is  to 
approximate  the  non-integral  values  through  interpolation  via  a  finite  scaling  filter/called  a 
Lagrange  interpolation  filter.  A  Lagrange  interpolation  filter  is  such  that 
[6  p.  2468] 


/  = 


~ir' 


(3.39) 


where  h  is  an  appropriate  Daubechies  wavelet  filter.  In  this  application  we  implemented  a 
Daubechics  4  as  the  basis  for  the  Lagrange  interpolation  filter  as  in  [6  p.  2475]. 

The  Morlet  wavelet  is  used  in  the  A-Trous  implementation  and  is  given  by; 

=  (3.40) 

where  (3  is  the  roll-off  parameter  which  determines  how  fast  the  modified  Gaussian  filter  decays  to 
1/e  of  its  peak  value.  The  center  frequency,  v.of  the  first  modified  Gaussian  filter  or  voice  is  set  at 
some  fraction  of  n  above  7t/2.[6  p.  2478]  The  Fourier  transform  of  \(/(t)  is  given  by 

=  (3.4,) 

The  following  restrictions  need  to  be  satisfied  by  the  parameters  v,  and  P: 


(3.42) 


The  center  frequency  of  the  voice  must  be  greater  than  71/2  in  order  for  g  to  be  in  the  upper  half  of 
thi.  frequency  spectrum.  Next,  in  order  for  \)r(t)  to  be  analytic,  and  admissible. 


V 

2^' 


(3.43) 


Finally  in  order  that  the  spectrum  not  be  aliased, 

v  <  ;r  -  -Ilfi. 
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(3.44) 


Any  number  of  voices,  i.e.  sub-filters  defined  at  each  scale,  can  be  chosen. 

The  number  of  voices  M  needed  to  fill  the  upper  spectrum  is  [6  p.  2478]: 


nil 

2V2i3 


(3.45) 


Voices  add  an  extra  dimension  to  the  DWT  in  that  greater  firequency  resolution  can  thus  be  seen 
per  octave.  This  higher  resolution  is  what  makes  this  implementation  of  the  DWT  of  interest  to  us, 
because  it  lets  us  discriminate  between  different  signals  that  lie  close  together  in  the  frequency 
domain. 


a.  Application  of  the  A-Trous  Algorithm  to  signal  classification 
The  signals  under  study  sometimes  occupy  the  same  octave  level  making  it 
difficult  to  separate  them.  The  A-Trous  algorithm  is  applied  in  this  study  using  a  range  of  three  to 
seven  voices  per  octave  which  inaeases  the  frequency  resolution  of  the  transform  to  better  separate 
the  signals.  Figure  3-27  represents  the  spectral  partitioning  obtained  using  four  voices  per  octave, 
a  center  frequency,  v  =  .857c,  and  a  |5  =  0. 15. 


A  Trous  lnnpl.:4  voice(s)  -  beta:0.15  nu:0.85pi 


Figure  3-27  Frequency  coverage  of  the  A  Trous  algorithm  using  seven  octaves,  p  =  0.15,  and  v  =  0.85 

It. 

As  before,  each  signal  is  segmented  into  512  p)oint  segments  and  normalized  to  have  unit  power  per 
segment.  Seven  octaves  are  used  to  include  the  low  frequency  resolution  of  the  Earthquake  signal. 
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The  resulting  a\erage  energy  per  scale and  voice  j,  E,.y  was  then  computed  for  each  segment 
using: 


£, .  = 


1 


‘■J 


256-2" 


I 

<:=! 


-i.j.t 


izz'y 


8. 


>0.1.  N-l.  (3.46) 

where  N  is  the  number  of  voices  chosen,  and  Cij.k  represent  the  /r*  wat^elet  coefficient  obtained  at 
scale  i  and  \  oice;.  The  average  energy  per  scale  per  voice  per  segment  is  then  processed  by  the 
back-propagation  neural  network  for  classification. 

In  his  chapter,  we  have  introduced  the  methods  of  feature  extraction  used  in  this  work. 
Next  in  Chapter  IV,  the  various  signal  features  are  used  as  inputs  to  a  back-propagation  neural 
network  for  classification  purposes.  We  present  the  concept  of  the  back-propagation  neural 
network  and  the  neural  network  choices  selected  in  this  work. 
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IV.  CLASSIFICATION  VIA  BACK-PROPAGATION  NEURAL 

NETWORK 

A.  INTRODUCTION 

Through  the  feature  extraction  process,  signals  undar  study  are  reduced  from  an  average  of 
40,000  data  points  to  a  much  smaller  number  of  related  coefficients.  Back-propagation  neural 
networks  have  been  proven  useful  in  numerous  complex  classification  problems  [9],  and  this 
section  introduces  the  back-propagation  neural  network  configuration  used  for  the  classification 
procedure. 

The  back-propagation  network  is  a  multilayer  feedforward  network.  Typically  it  consists 
of  an  input  layer,  one  or  more  computational  layers  called  hidden  layers  and  an  output  layer.  The 
network  learns  during  a  supervised  training  session,  where  each  input  vector  has  a  target  output 
vector.  Learning  takes  place  when  input  related  coefficients  are  presented  to  the  network  in  the 
input  layer.  The  input  is  propagated  through  the  network  in  a  forward  direction,  on  a  layer-by- 
layer  basis,  to  the  ou^ut  layer.  The  output  layer  is  compared  to  the  target  classification  and  the 
error  is  back-propagated  through  the  network  layer-by-layer,  neuron  by  neuron,  updating  the 
connection  weights.  The  connection  weights  are  the  memory  of  the  system.  Once  the  network 
converges  on  a  stopping  criteria,  the  weights  become  fixed  and  the  network  can  be  used  for  testing. 
The  testing  procedure  is  conducted  using  data  with  the  same  characteristic  as  those  used  in  the 
training  phases.  The  NN  output  is  then  compared  with  the  known  target  valu^  and  a  classification 
rate  is  tabulated. 

This  section  introduces  the  concept  of  the  back-propagation  network  as  applied  in  this 
thesis,  and  the  network  choices  selected  for  training  the  network. 

B.  THE  BACK-PROPAGATION  NEURAL  NETWORK 

This  study  employed  networks  consisting  of  an  input  layer,  two  hidden  layers,  and  an 
output  layer.  Each  layer  is  fully  connected  to  the  succeeding  layer,  meaning  the  output  of  each 
neuron  is  connected  to  the  input  of  each  neuron  in  the  next  layer.  During  learning,  information  is 
paired  with  a  desired  response  or  target.  The  output  of  the  output  lay^  neurons  is  compared  with 
the  target  producing  the  local  error  at  the  output.  This  learning  scheme  where  the  network  is  given 
both  the  input  and  the  target  classification  is  called  supervised  learning.  The  local  error  is 
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propagated  back  through  the  network,  and  used  to  update  the  connection  weights.  The  back- 
propagation  network  employed  has  a  hetero-associative  memoiy^  meaning  the  pattern  on  recall 
from  memorv'  is  purposely  different  from  the  input  pattem[10  p.  N.  3 15].  In  other  words,  the 


network  learns  higher  order  relationships  for  each  classification  of  input  data  and  classifies  based 
on  these  relationships.  A  diagram  of  a  typical  back-propagation  network  is  given  next. 


1.  The  Processing  Element 

Figure  4-1  presents  the  basic  building  block  of  the  neural  netwwk,  called  the  processing 
element  (pictured  as  circles)  .  A  widely  accepted  diagram  for  the  processing  element  (PE)  is  shovm 
in  Figure  4-2.  Each  PE  linearly  combines  the  individually  weighted  inputs  from  the  previous 
la>’er,  and  a  bias  value.  It  then  transforms  the  combination  through  a  nonlinear  transfer  fimetion  to 
form  the  output  of  that  PE.  Each  output  is  then  input  to  a  processing  element  in  the  next  layer  and 
indi\iduall\'  weighted.  Note  that  the  input  layer  structure  is  different  from  the  remaining  lavers,  as 
it  does  not  process  the  data,  but  serves  as  a  buffer  that  distributes  the  input  to  the  hidden  layers 
where  processing  occurs. 

The  following  notations  are  used  in  this  section  to  define  the  NN  elements! 

•  — >  output  of  the  neuron  in  the  s  layer, 

•  Wj.'"'  -^connection  weight  joining  the  i*  neuron  in  the  [s- 1  ]  layer  to  the  j*  neuron  in 
layer  s, 

•  l/'^  “^w  cighted  summation  of  inputs  to  the  neuron  in  layer  s. 
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Synaptic  Weights 
(including  bias) 

Figure  4-2  Generic  Processing  Element 


2.  The  Transfer  Function 

A  back-propagation  element  transfers  its  inputs  as  follows[10,  p.  NC-64]: 

j  .  (4.1) 

(p  is  traditionally  a  sigmoid  function,  but  can  be  any  differentiable  function.  In  this  study,  a 
hyperbolic  tangent  is  used  and  is  defined  as: 

-e~^ 

(p(z)=  -  (4.2) 

This  transfer  function  is  asymetric  about  the  origin  and  has  a  range  of  output  values  from  -1  to  +1. 

An  error  signal  originates  at  each  output  neuron  of  the  network.  Thus,  the  network  as  a 
whole  has  a  global  error  function  that  is  a  compilation  of  each  OTor  at  each  output  neuron.  Given 
that  a  network  has  some  global  error  function  E  that  is  a  differentiable  function  of  all  of  the 
connection  weights  in  the  network,  the  aitical  parameter  that  is  projected  back  through  the  network 
is  the  local  error.  The  local  error  is  defined  as: 


(jj  _  6E 


(4.3) 
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This  equation  defines  the  local  error  obtained  at  the  f  PE  in  the  s*  layer.  Using  the  chain  rule 
twice  yields  the  relationship  between  the  local  error  at  a  specific  PE  in  level  5  and  all  of  the  local 
errors  at  the  previous  level  s+J: 


=  (4.4) 

t 

The  derivative  of  the  hyperbolic  tangent  transfer  function,  (p,  is: 

(p’  (z)  =  (1.0  +  (p(z))  *  (1.0  -  (p{z))  (4.5) 

Therefore  the  local  error  can  be  revwitten  as: 

=(1.0  +  Ary')(1.0-;ry').X(e,'””.Mf ").  (4.6) 

I 

Equations  4. 1  through  4.4  are  the  main  worldngs  of  the  back-propagation  network.  Again  the  idea 
is  to  forward  propagate  the  input  to  the  output,  compare  the  output  to  the  target  to  determine  the 
local  error  at  the  output,  then  back-propagate  the  local  error  to  the  input  layer.  The  goal  of  the 
supervised  learning  process  is  to  minimize  the  global  error  by  adjusting  the  weights,  imparting 
knowledge  of  the  local  error  to  each  PE.  This  is  done,  as  in  the  LMS  algorithm,  through  the 
gradient  descent  method.  The  gradient  of  the  global  error  is  approximated  by: 

SE  f  6E  n  1 

- =  -  •  - 1 —  ir  — ^  (A 

J  LH”J  '  '  ’ 

The  weights  arc  updated  in  accordance  with  the  size  and  direction  of  the  negative  gradient  on  the 
error  surface  scaled  by  the  learning  coefficient  Icoef  in  accordance  with  the  following  equation 
[10.  p.  NC-66]: 


Airjj’  =  Icoef  ■  x''"") , 


3.  The  Normalized-Cumulative  Delta  Learning  Rule 

Learning  coefficients  are  determined  by  the  specific  learning  algorithm  employed.  This 
study  used  the  Normalized-Cumulative  Della  learning  rule,  where  the  weight  update  equation 
becomes: 
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' = rUy  +  Cj  *  e ,  *  x-  at  each  iteration 


Wij’=w^+/n,j+C2fl^. 

a^'=m^ 

7n/  =  0 


if  learn  count  modulo  Epoch  =  0 


(4.9) 


An  Epoch  is  the  number  of  iterations  the  network  will  use  to  either  converge  or  stop  processing.  Cj 
is  defined  as  the  learning  rate  and  is  related  to  the  Epoch  size.  The  modulo  of  the  Epoch  is  defined 
as  the  number  of  inputs  to  be  considered  as  an  example  set.  The  number  of  the  Epochs  determines 
the  value  for  the  learning  rate.  As  the  number  of  the  Epochs  increase,  the  learning  rate  should 
deaease,  otherwise  the  accumulated  weight  changes  will  cause  the  learning  to  diverge.  C2  is  the 
momentum  term  and  is  used  to  help  smooth  out  the  weight  changes,  mjj  is  the  memory  term  where 
m  is  the  present  memory  of  the  PE  and  m’  is  the  resulting  memory  change.  The  weight  changes  are 
accumulated  over  the  modulo  of  the  Epoch  and  are  appMed  at  the  end  of  the  modulo  of  the  Epoch. 
While  the  network  is  iterating  the  learn  count  modulo  Epoch  is  equal  to  zero.  At  the  end  of  the 
modulo  of  the  Epoch,  learn  count  is  equal  to  1  and  the  weight  changes  are  applied.  The 
Normalized-  Cumulative  Delta  learning  rule  scales  the  learning  rate  Ci  by  one  over  the  square  root 
of  the  epoch  size.  The  modulo  of  the  Epoch  for  this  study  was  set  at  100  training  files  for  each 
network,  where  one  file  is  one  input  vector  of  features  extracted  by  one  of  the  methods  addressed  in 
chapter  three  coupled  with  its  output  target.  The  target  is  the  true  classification  of  the  signal 

4.  MinMax  Tables 

Saturation  of  the  transfer  function  occurs  when  input  to  the  neural  network  is  not  suitably 
scaled  to  the  transfer  function.  The  hyperbolic  tangent  transfer  function  acts  nearly  linear  to  inputs 
which  are  in  the  range  of  +  2.  If  input  values  to  the  network  are  extremely  large,  for  example 
10,000,  even  small  weights  will  cause  saturation  of  the  transfer  functions.  When  the  transfer 
function  is  saturated,  the  derivative  of  the  transfer  function  is  nearly  zero.  This  causes  the  weights 
not  to  be  updated  and  the  network  does  not  learn.  To  avoid  this  hazard,  MinMax  tables  are 
generated  to  scale  the  input  to  the  network  to  the  transfer  function.  This  pre-processing  function  is 
available  in  the  NeuralWorks  professional  II/Plus  software  [10]  and  was  used  in  this  study. 

5.  Network  Architecture 

The  number  of  PEs  in  the  hidden  layer,  and  the  number  of  hidden  layers  in  a  back- 
propagation  neural  network  are  important  decisions  in  network  architecture.  Most  hack- 
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propagation  networks  will  have  one  or  nvo  hidden  layers  with  the  number  of  PEs  in  the  hidden 
la\  ers  falling  in  between  the  number  of  input  values  and  the  number  of  output  PEs.  The  number  of 
PEs  depends  on  the  complexit>-  of  the  relationships  betw-cen  classifications  of  data.  Signals  that 
are  not  easily  separated  will  require  more  PEs  to  distinguish  between  them.  A  rule  of  thumb  most 
quoted  in  literature  for  a  single  hidden  layer  network  is  [9  p.  39]. 


^  #  of  training  cases 

5  X  (m  +  n) 


(4.10) 


where; 


•  A  is  the  number  of  PEs  in  the  hidden  layer, 

•  cases  are  the  number  of  records  in  the  training  set, 

•  m  is  the  number  of  PEs  in  the  output  layer, 

•  n  is  the  number  of  PEs  in  the  input  layer. 

Calculating  the  number  of  PEs  in  the  hidden  layer  for  the  reduced-rank  covariance  coefficients 
equals: 


540 

5  X  (6  +  25) 


=  3.48  =  4  . 


Actually  try  ing  the  network  with  twenty  five  input,  four  Wdden,  and  six  output  PEs  produced  a 
network  that  would  not  converge  to  a  reasonable  classification  rate  in  a  five  hour  time  frame.  The 
nehvork  architecture  that  constantly  converged  in  a  reasonable  time  frame  for  this  study  included  a 
first  hidden  layer  equal  to  the  number  of  inputs,  followed  by  fifteen  in  the  second  hidden  layer,  and 
SIX  output  classes.  The  networks  usually  trained  for  an  hour  to  an  hour  and  an  half  to  reach 
stopping  criteria.  Some  data  sets  responded  better  with  different  parameters,  however,  generally 
speaking  this  choice  worked  well.  A  verj'  limited  attempt  was  made  to  optimize  the  network, 
resulting  in  no  sigmficant  performance  improvement.  The  emphasis  of  this  work  was  to  prove  the 
concept  of  using  the  different  feature  extraction  methods  with  a  neural  network  classifier. 
Optimizing  the  neural  neUvork  structure  may  increase  the  classification  rate,  but  generally  will  be 
close  to  the  values  obtained  in  this  study.  The  more  effective  the  feature  extraction  is  at  finding 
unique  values  for  each  signal,  the  better  the  results  will  be  using  the  neural  neUvork  as  a  classifier. 


The  signals  used  in  this  stud>^  were  not  easily  separated,  and  thus  the  two  hidden  layer  neUvork 
worked  much  better  than  the  single  hidden  layer  network  derived  from  the  rule  of  thumb. 
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6.  The  Classification  Rate 

This  study  employed  a  classification  rate  as  the  measure  of  performance  for  the  network. 
The  idea  behind  the  classification  rate  is  for  the  network  to  pick  a  winner,  which  is  simply  the 
output  PE  with  the  largest  value.  Thus,  if  we  compare  the  winner  with  the  target  we  have  a  binary 
yes  or  no  answer  for  correctness  in  classification.  NeuralWorks  built  such  an  instrument  into  the 
software. 

The  classification  rate  instrument  provides  a  two-dimensional  comparison  of  desired 
results  to  actual  network  response.  This  instrument  is  appropriate  for  this  problem  because  the 
network  needs  to  classify  each  signal  as  one  of  the  six  biologic  or  seismic  categories.  The  output 
response  of  the  network  is  thresholded  with  a  1-of-  n  transformation.  The  winner  is  valued  at  1 , 
and  the  others  are  valued  at  zero  forming  the  winning  vector.  The  sum  of  the  winners  are  divided 
by  the  number  of  input  sets  per  output  category.  This  reveals  how  the  network  classified  each 
category  of  input  data  as  a  number  from  0  to  1.0,  and  the  overall  classification  rate  of  the  entire  net 
is  the  average  of  the  six  correct  classification  rates  per  category. 

The  dimension  of  the  classification  rate  instrument  is  a  square  of  size  of  the  output  layer. 

In  this  study,  the  output  layer  contains  PEs  representing  the  six  classes  of  signals.  The 
classification  rate  instrument  was  thus  a  6x6  matrix.  A  value  of  1.0  in  any  of  the  6  boxes  per 
column  means  for  that  input  all  were  classified  as  that  particular  output.  A  zero  corresponds  to 
none  of  the  input  were  classified  as  that  output.  The  perfect  classification  would  result  in  1.0s  on 
the  antidiagonal  and  zeros  every  where  else.  The  values  obtained  from  this  instnunent  are  included 
in  the  results  section  incorporated  in  the  output  matrix. 

One  drawback  to  the  classification  rate  is  that  this  quantity  doesn’t  relate  how  close  the 
maximum  PE  output  value  is  from  the  other  output  values.  Thus  in  addition  to  the  Classification 
Rate,  the  mean  and  standard  deviation  of  each  output  PE  value  are  computed  to  give  additional 
information  regarding  the  NN  performance. 

7.  Training  and  Testing  The  Neural  Network 

The  goal  of  training  a  back-propagation  network  is  to  encode  as  many  training  examples 
as  needed  to  correctly  generalize  [1 1  p.  176].  A  network  is  said  to  “generalize”  well  when  the 
classification  rate  computed  by  the  network  is  reasonably  high  for  test  data  which  was  not  used 
during  the  training  phase.  However,  it  is  assumed  that  the  test  data  is  drawn  from  data  with  the 
same  general  characteristics  as  the  training  data.  The  generalization  process  can  be  viewed  as  non- 
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linear  cur\e  fitting,  or  a  non-linear  input-output  mapping.  Thi.s  view  point  allows  us  to  visualize 
the  learning  process  not  as  a  ‘magical  life  giving  process’  but  as  a  good  non-linear  interpolation  of 
the  data.  A  network  that  is  designed  well  and  is  adequately  trained  should  still  have  a  high 
classification  rate  when  tested  on  data  that  is  slightly  different  from  the  data  used  for  training. 
When  the  network  learns  too  many  ambiguous  input-output  relationships,  that  is,  when  it  is 

os  entrained,  the  network  may  produce  poor  results  even  when  tested  with  data  that  is  only  slightly 
different  from  the  testing  set. 

NeuralWorks  provides  a  learning  and  testing  scheme  called  Savehest  to  mitigate  the  effects 
of  overtraining.  The  option  Savehest  trains  the  network  for  a  user  prescribed  number  of  epochs, 
and  tests  the  network  with  the  testing  file.  The  result  is  compared  to  the  previous  saved  best 
network.  The  network  is  saved  based  on  the  criteria,  i.e.,  highest  classification  rate,  or  lowest  root- 
mean  square  error.  This  process  repeats  for  the  user  prescribed  number  of  failures  to  produce  a 
new  best.  The  network  when  finished  will  revert  to  the  saved-best  network,  thus  averting  an 
os  ertrained  network  that  memorizes  the  training  set  of  data. 

In  thi.s  study,  the  length  of  training  sets  before  testing  is  set  at  10,000  epochs  and  the 

number  of  retries,  or  failures  to  produce  a  new  best,  is  set  at  20.  The  criteria  used  was  the  overall 
classification  rate. 

The  training  and  testing  files  were  built  using  the  signals  as  described  in  the  previous 
chapters.  Using  MATLAB  ,  a  matrix  was  constructed  with  each  column  representing  the  features. 
For  example,  in  the  case  of  the  reduced-rank  covariance  method,  the  number  of  features  was  the  25 
model  parameters  per  segment,  in  the  case  of  the  A-Trous  algorithm  with  4  voices  per  octave,  the 
number  of  features  was  equal  to  28  average  energy  values  per  voice  and  segment.  The  matrix  was 
appended  with  the  target  vector  consisting  of  six  values  of  either  a  one  or  a  zero  corresponding  to 
the  output  neuron  designated  to  the  classification  of  the  data.  The  data  was  separated  into  training 
files  and  testing  files  by  using  two  thirds  of  the  smallest  file  as  the  minimum  number  of  training 
files  per  class  of  data,  which  amounted  to  86  files.  The  remaining  one  third,  of  the  smallest 
category  was  used  as  minimum  number  of  testing  files  per  class.  Training  files  and  testing  files 
were  set  up  for  each  of  the  feature  extraction  methods.  The  features  include  reduced-rank 
cos-ariancc(AR)  coefficients,  ALE  pre-processed  AR  coefficients,  wavelet  average  energy  per 
scale  for  both  Symmlet  8,  and  Coiflct  3  coefficients,  ALE  pre-processed  wavelet  average  energy 
per  scale  for  both  Symmlet  8,  and  Coifiet  3  coefficients,  AR  and  wavelet  methods  combined  for 
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Symmlet  and  Coiflet  coefficients,  ALE  pre-processed  AR  and  wavelet  combination  for  both 
Symmlet  and  Coiflet  coefficients,  and  finally  the  average  energy  per  voice  per  scale  using  the  A 
Trous  method  for  3, 4,5,6,  and  7  voices  per  octave.  In  all,  40  different  networks  were  trained  using 
the  fifteen  above  categories.  The  twelve  best  networks  are  presented  in  the  results. 
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V.  RESULTS 

The  top  twelve  NN  implementations  of  all  networks  considered  in  this  study  are 
summarized  in  T able  5.1.  Note  that  ALE  based  noise  reduction  was  not  apphed  when  using  the  A- 
Trous  implementation  due  to  the  high  classification  rate  already  achieved.  The  network  numbers  in 
the  second  column  of  the  table  represent  the  number  of  PEs  in  the  input  layer/  hidden  layer  1/ 
hidden  layer  2/  and  the  output  layer.  The  overall  classification  rate  is  as  defined  earlier  in  Chapter 
IV  section  6.  In  many  of  the  feature  extraction  methods  used,  we  tried  to  “clean”  the  signals  by 
applying  the  ALE  algorithm  to  reduce  the  noise  and  increase  the  ability  of  the  NN  to  classify  the 
signal.  The  last  and  second  to  last  method  used  both  the  AR  and  Wavelet  feature  extraction 
methods  on  the  signal  and  combined  the  features  into  one  vector  for  the  NN  to  classify.  The  last 
method  took  this  approach  one  step  further  by  using  the  ALE  algorithm  first. 


Feature  Extraction  Method 

Network 

Overall  Classification  Rate 

AR  Coefficients 

25/20/15/6 

84.50  % 

ALE/AR  Coefficients 

25/20/15/6 

83.94  % 

Wavelet  (Coiflet  3) 

14/14/10/6 

78.05  % 

14/14/10/6 

84.67  % 

ALE/Wavelet  (Coiflet  3) 

14/14/10/6 

88.76  % 

ALE/Wavelet  (Symmlet  8) 

14/14/10/6 

95.17  % 

A-Trous  (4  Voices  Per 

Scale) 

28/28/15/6 

96.41  % 

A-Trous  (5  Voices  Per 

Scale) 

35/20/15/6 

93.46  % 

A-Trous  (6  Voices  Per 

Scale) 

42/42/15/6 

96.73  % 

A-Trous  (7  Voices  Per 

Scale) 

49/49/15/6 

95.10  % 

AR  &  Wavelet  (Coiflet  3) 

39/30/15/6 

86.64  % 

ALE/AR  &  Wavelet  (Coiflet 
3) 

39/30/15/6 

95.78  % 

Table  5.1  Feature  Extraction  Performances 


Table  5. 1  shows  that  the  highest  scoring  network  is  obtained  for  the  A-Trous  algorithm 
with  six  voices  per  scale.  The  best  solution,  requiring  the  least  preprocessing  and  the  smallest 
neural  network,  is,  however  the  A-Trous  algorithm  with  four  voices  per  scale.  Five  networks 
achieved  an  overall  classification  rate  in  excess  of  95  %.  This  is  our  self  induced  threshold  for  a 
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successful  classification  scheme.  Note  that  performance  could  potentially  be  farther  improved  by 
optimizing  the  neural  network  for  each  feature  extraction  method. 

Tables  5.2  to  5.13  and  Figures  5.1  to  5.12  present  the  performances  obtained  with  each  of 
the  feature  extraction  methods  summarized  in  table  5.1.  For  clarity  purposes,  a  detailed 
explanation  of  Table  5.2  is  presented  next.  Tables  5.3  to  5.13  follow  the  same  presentation. 

The  T'  row  in  Table  5.2  indicates  the  number  of  testing  files  presented  to  the  NN  for 
classification  in  each  class  of  signal.  The  1"  column  indicates  the  number  of  testing  files  classified 
in  a  specific  class.  All  other  rows  show  the  mean  and  standard  deviation  (STD)  obtained  at  each 
output  node.  The  classification  rate  (CR)  is  as  defined  in  Chapter  IV  section  6.  Finally,  the 
number  of  files  classified  in  each  class  of  signal  is  included. 

Row  2  to  6  present  individual  performance  results  obtained  for  each  class.  For  example, 
Row  2  shows  that  56  files  were  classified  as  “Sperm  Whale”  data,  and  that  41  out  of  5 1  files  were 
correcth  classified,  leading  to  a  classification  rate  CR  =  81%.  Miss-classified  sperm  whale  data 
were  classified  as  either  killer  whale  (7  files)  or  earthquake  files  (3  files).  In  addition,  average 
output  node  levels  are  presented.  For  example,  the  average  output  level  obtained  for  the  Sperm 
VTale  output  node  when  the  NN  is  presented  with  Sperm  Whale  data  is  0.6602,  and  its  standard 
de\iation  (STD)  is  equal  to  0.3602.  Note  that  the  Sperm  \\qiale  output  node  level  significantly 
drops  dowm  to  an  average  of  0.0656  when  the  NN  is  presented  with  other  types  of  signals(as  seen 
across  the  top  row),  as  expected.  The  main  diagonal  starting  from  the  upper  left  of  the  table  to  the 
lower  right  contains  the  number  of  correct  classifications  obtained  for  the  testing  data. 

Performance  results  contained  in  Table  5.2  are  also  graphically  presented  in  Figure  5. 1 .  Class 
number  1  through  6  refer  to  Sperm,  Killer,  Humpback,  Gray,  Pilot  whales  and  earthquake  data. 

Test  files  1  to  51  represent  Sperm  Whale  data.  Test  files  52  through  102  represent  Killer  whale 
data,  test  files  103  through  153  represent  Humpback  Whale  test  data,  test  files  153  through  204 
represent  Gra\-  WTiale  data,  test  files  205  through  256  represent  Pilot  Whale  data,  and  test  files 
257  through  306  represent  earthquake  data. 
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A.  PERFORMANCE  RESULTS  OBTAINED  USING  REDUCED-RANK  AR 
COEFFICIENTS 

1.  No  ALE  Preprocessing 

Figure  5. 1  and  Table  5.2  present  the  NN  output  classification  results  obtained  using 
reduced  rank  coefficients  (AR). 


Distribution  of  Neural  Net  Output  File 


Figure  5-1  Distribution  of  neural  network  classifications  obtained  using  reduced-rank  AR 
coefficients.  Overall  classification  rate:  84.50%. 

The  overall  classification  rate  of  84.50  %  is  somewhat  disappointing.  However,  note  that 

biological  and  earthquake  AR  frequency  contents  partially  overlap.  As  a  result,  the  NN  has 

difficulties  classifying  the  data  correctly. 
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Mean 

(STD) 

CR-Tf 

306  filc.s 

Sperm 

Input 

51  files 

Killer 

Input 

51  file.s 

Humpback 

Input 

51  files 

Gray 

Input 

51  files 

Pilot 

Input 

51  files 

Earthquake 

Input 

51  files 

Sperm 

Output 

56  flics 

0.6602 

(0.3602) 

81.00 -Tr 

41  file.s 

0.2395 

(0.3010) 

27.45  % 

14  flics 

0.0295 

(0.1746) 

0  9c 

0.0217 

(0.1717) 

0  9c 

-0.0408 

(0.1790) 

1.96% 

1  file 

0.0781 

(0.2457) 

0% 

Killer 

Output 

44  files 

0.2389 

(0.4447) 

13.29 ‘T 

7  files 

0.8464 

(0.3054) 

66.67  % 

34  files 

0.0240 

(0.1102) 

0  9c 

-0.0393 

(0.1380) 

5.71  9c 

3  files 

-0.0550 

(0.1119) 

0% 

-0.0171 

(0.1240) 

0% 

Humpback 

Output 

52  flies 

0.0072 

(0.0773) 

-0.0571 

(0.1408) 

ore 

0.9530 

(0.1961) 

94.29  9c 

48  files 
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Table  5.2  Distribution  of  neural  network  classifications  obtained  using  reduced-rank  AR 
coefficients;  overall  classification  rate:  84.5  % 
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2.  ALE  Preprocessing  Applied  to  Data 

Figure  5.2  and  Table  5.3  present  classification  results  obtained  when  the  underwater  data 
is  first  pre-processed  using  the  adaptive  line  enhancer  (ALE)  filter  introduced  earlier  in  Chapter  III 
section  3.  This  filter  is  applied  to  emphasize  the  narrow  band  contents  contained  in  the  data  and 
decrease  wide-band  noise.  Next  the  reduced  rank  AR  coefficients  of  the  filtered  data  are  computed 
to  be  used  as  feature  parameters.  Note  that  the  overall  classification  performance  decreased.  A 
possible  explanation  is  that  the  ALE  step  removes  some  of  the  unique  AR-  features  of  the  signal, 
thereby  making  it  more  difficult  for  the  NN  to  classify  the  data  correctly. 
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Figure  5-2  Distribution  of  neural  network  classifications  obtained  with  reduced-rank  AR 
coefficients;  ALE  preprocessing  applied  to  the  data;  overall  classification  rate:  83.94%. 
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Table  5.3  Distribution  of  neural  network  classifications  obtained  with  reduced-rank  AR 
coefficients;  ALE  preprocessing  applied  to  the  data;  overall  classification  rate:  83.947c. 
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B.  CLASSIFICATION  RESULTS  OBTAINED  WITH  WAVELET-TYPE 
PARAMETERS 


1.  Orthogonal  Wavelet  Decomposition 
a.  No  ALE  preprocessing: 

This  section  presents  results  obtained  for  the  classification  scheme  using 
orthonorraal  wavelet-type  quantities  as  NN  inputs.  The  resulting  classification  rate  is  nearly 
identical  to  that  obtained  with  the  reduced-rank  AR  method  in  the  case  of  the  Symmlet  8  basis  set. 
However,  this  wavelet-based  method  offers  the  advantage  of  requiring  fewer  NN  input  parameters. 
Fourteen  coefQcients  are  used  as  compared  to  twenty-five  coefficients  for  the  reduced  rank  AR 
method.  For  this  reason,  this  method  represents  a  better  solution  even  though  the  overall 
classification  rate  is  the  same.  The  results  obtained  using  the  Coiflet  3  basis  set  do  not  reach  the 
level  obtained  by  the  AR  and  the  Symmlet  8  basis  set.  Detailed  results  are  contained  in  Figure  5-3 
and  Table  5.4  for  Symmlet  8  wavelet  coefficients.  Results  for  Coiflet  3  wavelet  coefficients  are 
contained  in  Figure  5-4  and  Table  5.5. 
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Figure  5-3  Distribution  of  neural  network  classifications  obtained  using  the  orthonormal  wavelet 
decomposition;  Symmlet  8  basis  set;  overall  classiHcation  rate:  84.67%. 
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Table  5.4  Distribution  of  neural  network  classirications  obtained  using  the  orthonormal  wavelet 
decomposition:  Symmlet  8  basis  set;  overall  classification  rate:  84.67%. 
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Figure  5-4  and  Table  5.5  below  displays  the  results  of  using  Coiflet  3  basis  set.  This 
feature  extraction  technique  achieved  an  overall  classification  of  78.05%.  The  results  are 
disappointing  in  that  they  are  far  below  that  obtained  by  using  the  Symmlet  8  basis  set  and  the  AR 
neural  networks. 


Distribution  of  Neural  Net  Output  File 


Figure  5-4  Distribution  of  neural  network  classifications  obtained  usii^  the  orthonormal  wavelet 
decomposition;  Coiflet  3  basis  set;  overall  classification  rate:  78.05%. 
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Table  5.5  Distribution  of  neural  network  classificatioas  obtained  using  the  orthonormal  wavelet 
decomposition;  Coiflet  3  basis  set;  overall  classification  rate:  78.059^. 
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b.  ALE  pre-processing  applied  to  data 

This  section  presents  classification  results  obtained  when  applying  the  ALE  noise 
reduction  technique  to  the  data  before. using  Mallat’s  algorithm.  In  contrast  to  the  AR  process,  the 
multiresolution  algorithm  greatly  benefited  from  the  ALE  pre-processing.  The  classification  rate  is 
up  to  95.17%  for  the  Symmlet  8  basis  set.  Both  the  Symmlet  8  basis  set  and  the  Coiflet  3  basis  set 
increased  the  ovCTall  classification  rate  by  ten  percent  by  first  pre-processing  with  the  ALE  filter. 
Detailed  results  are  presented  in  Figure  5-5  and  Table  5.6  for  the  Symmlet  8  basis  set  next.  Figure 
5-6  and  Table  5-7  present  the  data  for  Coiflet  3  basis  set. 
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Figure  5-5  Distribution  of  neural  network  classifications  obtained  using  the  orthonormal  wavelet 
decomposition;  Symmlet  8  basis  set;  ALE  pre-processing;  overall  classification  rate:  95.17%. 
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Table  5.6  Distribution  of  neural  network  classifications  obtained  asing  the  orthonormal  wavelet 
decomposition:  Symmlet  8  basis  set;  ALE  pre-processing;  overall  classification  rate:  95.17^^. 


Figure  5-6  and  Table  5.7  display  the  results  of  the  ALE  Coiflet  3  basis  set.  The  ova-all 
classification  rate  of  88.75%  is  a  drastic  improvement  over  the  Coiflet  3  set  alone,  however  it  does 
not  quite  match  the  95. 17%  achieved  by  the  ALE  Symmlet  8  basis  set.  It  is  an  improvement  over 
the  AR  coefficient  method  and  did  require  a  smaller  neural  network  to  implement.  In  this  respect, 
it  is  a  more  successful  implementation.  It  does  not  meet  the  95%  self  induced  threshold  for 
success. 
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Figure  5-6  Distribution  of  neural  network  classifications  obtained  using  the  orthonormal  wavelet 
decomposition;  Coiflet  3  basis  set;  ALE  pre-processing;  overall  classification  rate:  88.75% 
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Table  5.7  Distribution  of  neural  network  classincatioas  obtained  using  the  orthonormal  wavelet 
decomposition:  Coiflet  3  basis  set;  ALE  pre-processing;  overall  classification  rate:  88J5Vc. 
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2.  Non-Orthogonal  A-Trous  Wavelet  Decomposition 

This  section  presents  classification  results  obtained  when  applying  the  A-Trous  algorithm. 
Note  the  ALE  noise  reduction  technique  is  not  applied  to  the  data  as  the  performance  is 
satisfactory  without  it.  The  A-Trous  algorithm  takes  advantage  of  a  better  frequency  resolution 
obtained  when  using  multiple  voices.  Results  show  that  classification  performance  is  improved 
using  this  method  as  compared  to  the  other  methods  used  in  this  study. 

The  A-Trous  decomposition  is  implemented  with  four  different  combinations  of  voices. 
Four,  five,  six,  and  seven  voices  per  scale  are  presented.  Results  show  that  the  overall  best 
classification  rate  is  obtained  when  using  six  voices  per  scale,  however  the  size  of  the  NN  was 
significantly  larger.  Figures  5-7  through  5-10  and  Tables  5.8  through  5.11  present  the 
p^formance  results. 

a.  A-Trous  Implementation  with  four  voices  per  scale 
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Figure  5-7  Distribution  of  neural  network  classifications  obtained  using  the  A-Trous 
implementation;  4  voices  per  scale;  overall  classification  rate:  96.41%. 
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1.1055 

-0.1058 

0.1253 

-0.0932 

0.2603 

-0.0588 

Output 

(0.0768) 

(0.0803) 

(0.2209) 

(0.1125) 

(0.2989) 

(0.0481) 

51  flies 

98.04 

50  files 

1.96  97 

1  file 

0  9c 

0  97 

0  9c 

0% 

Killer 

-0.0279 

0.8835 

-0.0578 

-0.0378 

0.0317 

-0.0599 

Output 

(0.2152) 

(0.2500) 

(0.1303) 

(0.1090) 

(0.2125) 

(0.1133) 

46  files 

O^'c 

90.20  97 

46  files 

0  9c 

0  9c 

0  9c 

0% 

Humpback 

-0.0081 

0.0571 

1.0518 

-0.0086 

-0.0540 

0.1096 

Output 

(0.0680) 

(0.1544) 

(0.1287) 

(0.0826) 

(0.0655) 

(0.1198) 

51  files 

OTi- 

0  9c 

100% 

51  files 

0  9c 

0  9c 

0% 

Gray 

-0.0709 

-0.0566 

-0.1007 

1.0186 

-0.0055 

0.0018 

Output 

(0.0672) 

(0.0845) 

(0.0424) 

(0.1749) 

(0.1646) 

(0.1737) 

53  flics 

OTf 

0  9c 

097 

98.04  97 

50  files 

5.88  97 

3  files 

0% 

Pilot 

0.0668 

-0.0250 

0.0008 

0.1808 

0.7893 

-0.0273 

Output 

(0.1091) 

(0.0857) 

(0.0876) 

(0.1753) 

(0.2384) 

(0.1775) 

52  files 

1.96  Tr 

1  file 

7.84  9c 

4  files 

0  97 

0  9c 

1 

92.16% 

47  files 

0% 

Earthquake 

-0.0799 

-0.0862 

-0.1247 

0.0248 

0.2086 

1.0175 

Output 

(0.0371) 

(0.0699) 

(0.0009) 

(0.1136) 

(0.2744) 

(0.2166) 

53  flics 

097 

097 

0  97 

1.96  97 

1  file 

1.96% 

1  file 

100  % 

51  files 

Table  5.8  Distribution  of  the  neural  network  classificatioas  obtained  using  the  A-Trous 
implementation;  4  voices  per  scale;  overall  classification  rate:  96.41% 


62 


b.  A-Trous  Implementation  with  five  voices  per  scale 
Figure  5.8  and  Table  5.9  display  the  results  obtained  from  5  voices  pre  scale. 
Note  that  the  resultant  overall  classification  rate  is  less  than  that  obtained  from  four  voices  per 
scale.  This  was  not  expected.  The  expectation  was  that  an  increase  in  resolution  of  the  feature 
extraction  would  produce  an  increase  in  the  overall  classification  rate. 
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Figure  5-8  Distribution  of  neural  network  classifications  obtained  using  the  A-Trous 
implementation;  5  voices  per  scale;  overall  classification  rate:  93.46  % . 
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Mean 

(STD) 

CR. 

306  flics 

Sperm 

Input 

51  files 

Killer 

Input 

51  files 

Humpback 

Input 

51  files 

Gray 

Input 

51  files 

Pilot 

Input 

51  files 

Earthquake 

Input 

51  files 

Sperm 

Output 

50  flics 

1.0772 

(0.1719 

98.04  <rr 

50  files 

-0.0993 

(0.0937) 

0% 

0.1176 

(0.1915) 

0  57 

-0.1176 

(0.0225) 

0  57 

0.2158 

(0.2746) 

0% 

-0.0244 

(0.0847) 

0% 

Killer 

Output 

50  flics 

-0.0241 

(0.2131) 

0  9f 

0.8539 

(0.2843) 

86.27  57 

44  files 

-0.0205 

(0.1391) 

0  57 

0.0344 

(0.2557) 

5.88  57 

3  files 

0.0585 

(0.2295) 

1.96% 

1  files 

-0.0349 

(0.0497) 

0% 

Humpback 

Output 

49  files 

0.1622 

(0.3230) 

0  57 

-0.0119 

(0.0822) 

0  57 

1.0302 

(0.2397) 

96.08  57 

49  files 

0.0658 

(0.2254) 

0% 

0.0883 

(0.2702) 

1.96% 

1  file 

0.0450 

(0.1794) 

0% 

Gray 

Output 

53  files 

-0.0877 

(0.0376) 

057 

0.0312 

(0.2181) 

11.76  57 

6  files 

-0.0973 

(0.0332) 

057 

0.8953 

(0.3045) 

92.16  57 

47  files 

-0.0364 

(0.1775) 

7.84  % 

4  files 

0.0257 

(0.1469) 

0% 

Pilot 

Output 

51  files 

-0.(4437 

(0.1527) 

1.96  57 

1  file 

-0.0075 

(0.1035) 

1.96 

1  files 

0.0159 

(0.0980) 

0  57 

0.0592 

(0.1890) 

1.96% 

1  files 

0.7276 

(0.3100) 

88.24  % 

45  files 

0.0036 

(0.0991) 

0% 

Earthquake 

Output 

53  flies 

-0.0660 

(0.0500) 

057 

-0.1115 

(0.0310) 

057 

-0.1215 

(0.0076) 

3.92  % 

2  files 

0.0119 

(0.1349) 

0% 

0.2458 

(0.3796) 

0% 

1.0710 

(0.0951) 

100% 

51  files 

Table  5.9  Distribution  of  the  neural  network  classifications  obtained  using  the  A-Troas 
implementation;  5  voices  per  scale;  overall  classification  rate:  93.46  % 


X 

c.  A-Trous  Implementation  with  six  voices  per  scale 
This  implementation  of  the  A-Trous  algorithm  has  the  highest  classification  rate  of 
all  tested.  Even  though  this  implementation  has  the  overall  best  classification  rate,  it  also  has  more 
false  earthquake  classifications  than  eitha  four  voices  or  five  voices  per  scale.  In  addition  it 
requires  a  higher  number  of  NN  input  coefficients  and  correspondingly  a  more  complex  NN. 
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Figure  5-9  Distribution  of  the  neural  network  classifications  obtained  using  the  A-Trous 
implementation;  6  voices  per  scale;  overall  classification  rate:  96.73%. 
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Mean 

(STD) 

CR<T 

306  flies 

Sperm 

Input 

51  files 

Killer 

Input 

51  files 

Humpback 

Input 

51  files 

Gray 

Input 

51  files 

Pilot 

Input 

51  files 

Earthquake 

Input 

51  flies 

Sperm 

1.1107 

-0.0848 

0.1438 

-0.1224 

0.2516 

-0.0579 

Output 

50  files 

(0.0751) 

(0.1879) 

(0.0100) 

(0.3110) 

(0.0670) 

(0.0481) 

98.04  91 

50  flics 

0  % 

0  91 

0% 

0% 

0  9c 

Killer 

-0.0735 

0.9539 

0.0807 

-0.0271 

-0.0321 

-0.0340 

Output 

(0.1462) 

(0.2745) 

(0.1903) 

(0.1585) 

(0.0450) 

(0.1133) 

51  files 

0  91 

98.04  91 

50  files 

0% 

1.96  91 

1  file 

0% 

0% 

Humpback 

-0.0468 

0.2230 

1.0638 

0.0304 

-0.0205 

0.0399 

Output 

(0.0856) 

(0.2996) 

(0,1425) 

(0.1423) 

(0.1759) 

(0.1198) 

49  files 

0  91 

0  91 

96.08  % 

49  files 

0  9c 

0% 

0% 

Gray 

-0.0916 

0.0776 

-0.1028 

0.9014 

-0.0263 

0.0083 

Output 

(0.0480) 

(0.3277) 

(0.3471) 

(0.1959) 

(0.1633) 

(0.1737) 

52  files 

091 

0% 

091 

96.08  91 

49  files 

5.88  % 

3  files 

0% 

Pilot 

0.0221 

-0.0585 

0.0136 

0.0238 

0.8450 

-0.0198 

Output 

(0.2134) 

(0.0892) 

(0.2466) 

(0.3758) 

(0.1460) 

(0.1775) 

48  files 

1.96  91 

1  file 

1.96% 

1  file 

0  91 

0  91 

92.16% 

46  files 

0% 

Earthquake 

-0.1000 

-0.1159 

-0.1233 

0.1588 

0.1851 

1.0276 

Output 

(0.0249) 

(0.0198) 

(0.2570) 

(0.3106) 

(0.1454) 

(0.2166) 

55  files 

091 

0  91 

3.92  % 

2  files 

1.96% 

1  file 

1.96% 

1  file 

100  % 

51  files 

Table  5.10  Distribution  of  the  neural  network  classifications  obtained  asing  the  A-Trous 
implementation;  6  voices  per  scale;  overall  classification  rate:  96.73  ^ . 
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d.  A-Trous  Implementation  with  seven  voices  per  scale 
Results  show  an  overall  classification  rate  of  95.10%,  as  indicated  earlier  in  Table 
5.1.  Note  that  the  NN  implementation  complexity  has  increased  due  to  the  larger  number  of  input 
NN  coefficients.  This  NN  configuration  has  49  PEs  in  the  input  layer,  49  PEs  in  hidden  1,  15  PEs 
in  hidden  2,  and  6  output  PEs.  This  is  the  largest  network  considered  in  this  study.  Figure  5-10  and 
Table  5.11  present  classification  results  obtained  using  the  set  of  averaged  wavelet-type 
coefficients.  We  note  that  the  NN  implementation  obtained  with  four  voices  per  scale  has  a  better 
overall  classification  rate  using  a  much  smaller  network. 
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Figure  5-10  distribution  of  neural  network  classifications  obtained  using  the  A-Trous 
implementation:  7  voices  per  scale;  overall  classification  rate:  95.10  %. 
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Mean 

(STD) 

CRT 

306  files 

Sperm 

Input 

51  files 

Killer 

Input 

51  files 

Humpback 

Input 

51  flies 

Gray 

Input 

51  files 

Pilot 

Input 

51  files 

Earthquake 

Input 

51  files 

Sperm 

Output 

51  flies 

1.1107 

(0.0751) 

100  T 

51  file.s 

-0.0848 

(0.1879) 

OT 

0.1438 

(0.0100) 

0% 

-0.1224 

(0.3110) 

0% 

0.2516 

(0.0670) 

0% 

-0.0579 

(0.0481) 

0% 

Killer 

Output 

57  flies 

-0.0735 

(0.1462) 

OT 

0.9539 

(0.2745) 

98.04  % 

50  files 

0.0807 

(0.1903) 

0  9c 

-0.0271 

(0.1585) 

13.73  % 

7  files 

-0.0321 

(0.0450) 

0% 

-0.0340 

(0.1133) 

0% 

Humpback 

Output 

50  flies 

-0.0468 

(0.0856) 

OT 

0.2230 

(0.2996) 

0% 

1.0638 

(0.1425) 

98.04  9c 

50  files 

0.0304 

(0.1423) 

0% 

-0.0205 

(0.1759) 

0% 

0.0399 

(0.1198) 

0% 

Gray 

Output 

49  files 

-0.0916 

(0.0480) 

OT 

0.0776 

(0.3277) 

1.96% 

1  file 

-0.1028 

(0.3471) 

09c 

0.9014 

(0.1959) 

84.31  % 

43  files 

-0.0263 

(0.1633) 

9.80  % 

5  files 

0.0083 

(0.1737) 

0% 

Pilot 

Output 

47  files 

0.0221 

(0.2134) 

OT 

-0.0585 

(0.0892) 

0% 

0.0136 

(0.2466) 

0  9c 

0.0238 

(0.3758) 

1.96% 

1  file 

0.8450 

(0.1460) 

90.20  % 

46  files 

-0.0198 

(0.1775) 

0% 

Earthquake 

Output 

52  flics 

-0.1000 

(0.0249) 

OT 

-0.1159 

(0.0198) 

0% 

-0.1233 

(0.2570) 

1.96% 

1  file 

0.1588 

(0.3106) 

0% 

0.1851 

(0.1454) 

0% 

1.0276 

(0.2166) 

100% 

51  files 

Table  5.11  Distribution  of  neural  network  classiHcations  obtained  using  the  A-Troas 
implementation:  7  voices  per  scale;  overall  classiHcation  rate:  95.10  %. 


C.  CLASSIFICATION  RESULTS  OBTAINED  BY  COMBINING  AR 
COEFFICIENTS  AND  ORTHONORMAL  WAVELET-TYPE  PARAMETERS 


This  section  presents  classification  results  obtained  by  combining  the  ALE  AR  and 
wavelet  feature  coefficients.  This  technique  is  considered  in  this  study  to  investigate  the  results 
when  a  combination  of  dissimilar  approaches  is  applied.  The  premise  behind  this  approach  is  the 
combination  of  different  techniques  would  provide  unique  combinations  of  parameters  for  input 
into  the  NN.  The  combination  may  dramatically  increase  the  classification  rate  over  any  single 
technique  applied  alone.  It  is  the  most  labor  intensive  pre-processing  technique  applied  in  this 
study,  and  requires  a  large,  complex  neural  network 

1.  No  ALE  Pre-processing  Applied  to  the  Data 

Figure  5-9  and  Table  5.10  present  the  results  obtained  by  combining  the  reduced-rank 
coefficients  with  wavelet  -  type  parameters  using  the  Coifiet  3  basis  set.  Redundant  processing 
with  different  techniques  should  produce  better  results  than  either  technique  alone.  This  technique 
only  slightly  enhanced  the  classification  rate  of  eitha:  alone. 

Distribution  of  Neural  Net  Output  File 


Figure  5-11  Distribution  of  neural  network  classifications  obtained  using  the  combination  of 
reduced-rank  AR  coefficients  and  wavelet-type  techniques;  Coiflet  3  basis  set;  overall  classification 

rate:  86.64%. 
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Mean 

(STD) 

CR% 

300  files 

Sperm 

Input 

50  files 

Killer 

Input 

50  files 

Humpback 

Input 

50  files 

Gray 

Input 

50  files 

Pilot 

Input 

50  files 

Earthquake 

Input 

50  files 

Sperm 

Output 

53  files 

0.8461 

(0.2763) 

87.80  % 

44  filc.s 

0.1606 

(0.1659) 

17.07  % 

9  files 

-0.0659 

(0.0529) 

0  9c 

-0.0061 

(0.0842) 

09c 

-0.0896 

(0.0460) 

0% 

-0.0449 

(0.0233) 

0% 

Killer 

Output 

39  files 

0.1645 

(0.2900) 

9.76  % 

5  flies 

0.5969 

(0.3959) 

56.10% 

28  files 

-0.0847 

(0.0369) 

0  9c 

0.0968 

(0.2703) 

9.76  9c 

5  files 

-0.0699 

(0.0715) 

2.00  % 

1  file 

-0.0622 

(0.0374) 

0% 

Humpback 

Output 

49  files 

-0.0599 

(0.0521) 

0% 

-0.1201 

(0.0112) 

0  9c 

1.0559 

(0.1703) 

98.00  9c 

49  flies 

0.0145 

(0.1015) 

0  9c 

-0.1131 

(0.0203) 

0% 

0.1199 

(0.1784) 

0% 

Gray 

Output 

61  files 

0.0728 

(0.2474) 

2.00% 

1  file 

0.0736 

(0.1374) 

26.00  9c 

13  files 

-0.0933 

(0.0355) 

09c 

0.6169 

(0.2902) 

87.80  9c 

44  files 

0.0338 

(0.1887) 

4.88  % 

3  files 

0.0314 

(0.0767) 

0% 

Pilot 

Output 

46  files 

-0.0588 

(0.0635) 

0% 

0.0083 

(0.1818) 

0  9c 

-0.0897 

(0.0417) 

0% 

0.0631 

(0.2117) 

2.44% 

1  file 

0.8373 

(0.3109) 

90.24  % 

45  files 

0.0065 

(0.1939) 

0% 

Earthquake 

Output 

52  files 

-0.0510 

(0.0602) 

0% 

-0.0574 

(0.0525) 

0% 

-0.1120 

(0.0095) 

2.00  9c 

1  file 

-0.0380 

(0.0344) 

0% 

0.0266 

(0.0494) 

2.00  % 

1  file 

1.0388 

(0.1326) 

100% 

50  files 

Table  5.12  Distribution  of  neural  network  clas.sification.s  obtained  asing  the  combination  of  reduced- 
rank  .\R  coefTicients  and  wavelet-type  parameters;  Coiflet  3  basis  set;  overall  classification  rate: 


86.64%. 
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2.  ALE  Pre-processing  Applied  to  tiie  Data 

Figure  5-12  and  Table  5.13  present  the  results  obtained  by  first  pre-processing  the  data 
using  an  ALE  filter  and  next  computing  reduced-rank  AR  coefficients  in  combination  with  wavelet- 
type  parameters.  It  produced  an  overall  classification  rate  of  95.78  %.  The  results  are  impressive 
howeva,  they  came  at  the  cost  of  pre-processing  time  and  a  very  complex  neural  network. 
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Figure  5-12  Distribution  of  neural  network  classifications  obtained  using  the  combination  of  ALE 
pre-processing,  reduced-rank  AR  coefficients,  and  wavelet-type  parameters;  Coiflet  3  basis  set; 

overall  classification  rate:  95.78  % 
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Mean 

(STD) 

CRT. 

300  flies 

Sperm 

Input 

50  files 

Killer 

Input 

50  files 

Humpback 

Input 

50  files 

Gray 

Input 

50  files 

Pilot 

Input 

50  files 

Earthquake 

Input 

50  files 

Sperm 

Output 

54  flies 

1.0952 

(0.0771) 

100  9c 

50  files 

-0.0983 

(0.0363) 

0  9r 

-0.0070 

(0.0805) 

0  9'c 

0.0314 

(0.1403) 

7.32  <7c 

4  files 

-0.0933 

(0.0336) 

097 

0.0704 

(0.1420) 

0% 

Killer 

Output 

49  files 

-0.020] 

(0.1889) 

OTc 

0.9231 

(0.2554) 

92.109c 

46  files 

-0.0431 

(0.0627) 

2.00% 

1  file 

0.0231 

(0.1948) 

4.88  % 

2  files 

-0.0013 

(0.1276) 

0% 

-0.0960 

(0.0313) 

0  97 

Humpback 

Output 

48  flies 

-0.0190 

(0.0750) 

09c 

-0.0311 

(0.0517) 

OTc 

1.0365 

(0.2427) 

96.00  % 

i 

48  files 

0.0041 

(0.0776) 

0% 

-0.0207 

(0.1113) 

0  97 

0.0264 

(0.1517) 

0  97 

Gray 

Output 

47  files 

-0.0313 

(0.1497) 

0  9*^ 

0.1827 

(0.1662) 

7.80  91- 

4  files 

-0.0508 

(0.0481) 

0% 

0.7570 

(0.2290) 

86.00  9^0 

43  files 

-0.0581 

(0.1476) 

0  97 

0.0358 

(0.1704) 

097 

Pilot 

Output 

51  flies 

-0.0598 

(0.1699) 

0  9’c 

-0.0792 

(0.0574) 

0<7c 

-0.0163 

(0.1071) 

0% 

0.0622 

(0.1890) 

2.00  % 

1  file 

1.0204 

(0.1904) 

100  97 

50  files 

0.1038 

(0.2787) 

0% 

Earthquake 

Output 

51  files 

-0.0920 

(0.0379) 

09; 

-0.0602 

(0.0708) 

09r 

-0.1021 

(0.0481) 

2.00  % 

1  file 

-0.0459 

(0.0409) 

0  9^ 

0.0298 

(0.2530) 

O^c 

0.9372 

(0.1827) 

100  97 

50  files 

Table  ^.13  Distribution  of  neural  network  classifications  obtained  asing  the  combination  of  ALE 
pre-processing,  reduced  rank  AR  coefficients,  and  wavelet-type  parameters;  Coiflet  3  basis  set; 

overall  classification  rate:  95.78%. 


VI.  CONCLUSION  AND  RECOMMENDATIONS 


A.  CONCLUSION 

This  thesis  investigated  spectral  based  feature  extraction  techniques  to  input  into  a  back- 
propagation  neural  network  to  resolve  ocean  biologic  signals  from  those  produced  by  earthquakes. 
The  back-propagation  neural  network  proved  to  be  a  very  successful  means  of  classifying  the  six 
categories  of  signals  when  used  in  conjunction  with  a  good  feature  extraction  technique.  This 
thesis  implemented  a  neural  network  classifier  that  can  differentiate  between  earthquakes  and  five 
different  species  of  whale  with  an  overall  classification  rate  exceeding  95  %.  Specifically 
investigated  were:  2  categories  of  feature  extraction  techniques,  reduced-rank  auto-regressive 
modeling  and  discrete  wavelet  transform  based  techniques.  An  adaptive  line  enhanca  was 
investigated  to  improve  the  neural  network  results  by  removing  uncorrelated  noise.  The  specific 
results  for  the  AR  modeling,  the  discrete  wavelet  transform  using  Symmlet  8  and  Coiflet  3  wavelet 
transforms,  combining  AR  and  DWT  techniques,  and  finally  the  discrete  wavelet  transform  using 
an  A  Trous  method  are  summarized  in  the  following  paragraphs. 

The  reduced-rank  covariance  method  is  used  to  produce  AR  models  of  the  time  domain 
signals.  The  choice  of  model  number  was  determined  systematically  using  three  model  order 
prediction  techniques  as  discussed  in  Chapter  III.  The  determination  of  usable  singular  values  in 
the  algorithm  is  determined  visually,  which  proved  to  be  only  moderately  successful.  The 
combination  of  applying  the  ALE  and  using  the  reduced-rank  covariance  method  produced  results 
that  were  actually  worse  than  using  the  reduced  rank  method  alone.  The  most  successful  neural 
network  implementation  using  this  feature  extraction  technique  had  an  overall  classification  rate  of 
84.67  %  as  reported  in  Chapter  V. 

Two  implementations  of  the  Wavelet  Transform  were  considered;  the  Mallat’s  algorithm, 
and  the  A-Trous  algorithm.  Mallat’s  algorithm  computes  an  orthonormal  decomposition.  This 
technique  produced  only  moderately  successful  results  on  par  with  the  reduced  rank  AR  technique. 
The  combination  of  applying  the  ALE  and  using  the  orthonormal  wavelet  based  technique  was 
somewhat  more  successful,  however  still  not  producing  acceptable  results.  Investigated  were 
Symmlet  8  and  Coiflet  3  wavelet  families.  Without  ALE  pre-processing  the  neural  network  overall 
classification  rates  obtained  were  84.67%  and  78.05%.  Using  the  ALE  pre-processing,  the  results 
were  brought  up  to  95. 17%  using  the  Symmlet  8  basis  set.  The  Coiflet  3  basis  set  improved  to 
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88.76  /<• .  Of  note  is  the  Symnilet  8  results  are  on  par  with  the  AR  modeling  technique  with  NN  that 
are  two  thirds  the  size  of  using  AR  modeling. 

A  combination  of  reduced-rank  and  orthonormal  w'avelet  based  techniques  w'ere 
in\estigated.  The  premise  being  that  by  combining  the  tw'o  techniques,  a  unique  paring  of  the  tw'o 
methods  would  produce  better  results.  The  combination  produced  an  overall  classification  rate  of 
86.58‘7c.  Only  a  slight  improvement  w-as  realized  over  either  technique  alone.  This  combinadon 
pro\cd  to  be  \cr\  successful  w'hen  the  ALE  was  also  employed.  An  overall  classification  rate  of 
95.78ff  was  achieved.  This  level  of  processing  for  the  feature  extraction  is  very  labor  intensive, 
and  involved  using  a  large  NN  to  classify  the  signals. 

The  A-Trous  based  technique  lead  to  the  best  performing  technique  considered  in  this 
study.  This  technique  produced  consistently  highly  successful  re.sults  that  did  not  need 
augmentation  w  ith  a  noise  reduction  technique.  Two  of  the  three  highest  classification  rates  were 
produced  by  neural  networks  implementing  this  technique.  Three  of  the  four  NNs  presented  using 
this  technique  were  above  95^  in  the  overall  classification  rate.  The  technique  that  combined  a 
high  o\crall  classification  rate  and  a  small  NN  is  the  A  Trous  with  4  voices  per  scale.  The  overall 
classification  rate  achieved  was  96.41 7c  with  a  network  only  slightly  larger  than  that  used  for  the 
AR  modeling  technique.  Only  one  other  extraction  technique  had  a  better  overall  classification  rate, 
the  A  Trous  with  6  voices  per  scale.  The  increase  in  resolution  came  at  the  expense  of  a  very  large 
NN  and  thus  is  not  the  optimal  answer. 

The  back-propagation  neural  network  proved  to  be  a  very  successful  means  of  classifying 
the  six  categories  of  signals  when  used  in  conjunction  with  a  good  feature  extraction  technique. 

The  high  rate  of  success  was  achic\'ed  even  though  very  little  effort  was  made  to  optimize  the 
neural  networks  for  the  data.  Better  results  might  be  achieved  by  optimizing  the  neural  network 
architectures  for  each  feature  extraction  technique.  Improvements  may  also  be  realized  by 
employing  a  more  sophisticated  means  of  noise  reduction. 
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B.  RECOMMENDATIONS 


High  classification  rates  obtained  using  the  non-orthogonal  wavelet  based  techniques  are 
encouraging  and  warrant  further  study  regarding  the  optimization  of  the  NN  implementation. 

The  lack  of  success  of  the  AR  method  may  be  indicative  of  the  relative  simplistic 
algorithm  used  in  this  study.  ARMA  modeling  may  provide  a  better  solution  while  achieving  a 
reduction  in  the  order  of  the  model.  A  potential  improvement  of  the  reduced-rank  covariance 
technique  would  be  to  automate  self  adjusting  of  the  number  of  singular  values  chosen.  The  visual 
technique  employed  in  this  thesis  was  cumbersome  and  requires  human  intervention. 

In  addition,  an  improvement  in  the  de-noising  algorithm  may  be  achieved  by  employing  a 
wavelet  based  de-noising  technique  suggested  in  [7].  The  implementation  of  the  ALE  algorithm 
was  hampered  by  applying  the  same  technique  to  all  signals.  The  application  of  a  single  technique 
did  not  provide  the  optimum  de-noising  solution  to  any  of  the  signals,  yet  provided  an  engineering 
solution  to  de-noise  the  signals  while  not  significantly  degrading  any  single  class  of  signal. 

Wavelet  based  techniques  may  enhance  the  ability  to  de-noise  live  ocean  signals  better  than  the 
ALE  algorithm. 

Finally,  improvement  could  be  obtained  by  employing  a  neural  network  architecture  that  is 
capable  of  expanding  to  recognize  new  class  of  signals  vnthout  having  to  be  retrained  on  the 
signals  it  already  recognizes.  One  such  architecture  that  shows  promise  is  the  Fuzzy  ARTMAP 
architecture  mentioned  in  [10]. 
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APPENDIX 


ORDER.M 

function  [AIC,CAT,FPE,MDL,var]=order(data,P) 

%  ORDER  uses  the  AIC,  CAT,  FPE,  and  MDL  criteria  to  choose  the 
%  appropriate  AR  model  order.  Each  output  variable  is  a  P-by-1  vector 
%  with  the  quantities  indicated.  The  index  of  the  minimum  of  each  quantity 
%  is  the  best  theoretical  model  order.  The  variable  "var”  does  not 
%  yield  a  minimum  at  the  ideal  model  order;  its  usefulness  is  in  examining 
%  the  model  order  at  which  the  variance  becomes  "flat." 

% 

%  Usage:  [AlC,CAT,FPE,MDL,var]=:order(data,P) 

% 

%  Input:  data  Sequence  to  be  modeled 
%  P  Highest  model  order  to  test  (all  model  orders  from 

%  1  to  P  will  be  tested). 

% 

%  Output:  AIC  AIC  quantity  (vector  of  length  P) 

%  CAT  CAT  quantity  (vector  of  length  P) 

%  FPE  FPE  quantity  (vector  of  length  P) 

%  MDL  MDL  quantity  (vector  of  length  P) 

%  var  Theoretical  prediction  error  variance  (vector  of  length  P) 

%  ORDER  calls  the  function  BURG__A 

%  Written  by  K.  L.  Frack  Last  Update:  16  March  1994 

Ns=length(data); 

AIC=zeros(P,l); 

CAT=zeros(P,l); 

FPE=zeros(P,l); 

MDL=zeros(P,l); 

var=zeros(P,l); 

for  p=l:P; 

[a,var(p)]=burg_a(data,p);  %  Compute  variance  at  each  order 
AIC(p)=Ns*log(var(p))+2*p;  %  Compute  AIC  quantity 
SUM=0; 

for  pp=l:p  %\  Compute  CAT  quantity 

SUM=SUM+(Ns-pp)/Ns/var(pp); 
end 

CAT(p)=SUM/Ns-(Ns-p)/Ns/var(p); 

FPE(p)=var(p)*(Ns+p+l)/(Ns-p-l);  %  Compute  FPE  quantity 
MDL(p)=Ns*log(var(p))+p*iog(Ns);  %  Compute  MDL  quantity 
end 
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BURG_A.M 

function  [a.var]=  burg_a(ciata.P) 

Burg's  Method  Model 

This  function  solves  the  AR  model  coefficients  for  the  given  data  sequence 
using  Burg's  method. 

Usage:  [a.var]  =  bure_a(dataR) 

Input:  data  Data  sequence 
P  Desired  model  order 

<rr 

Output:  a  Vector  with  "a"  parameters  ([1  al  a2  ...  aP]) 

^  var  Prediction  error  variance 

Tr  Written  by  K.  L.  Frack  Last  Update:  15  March  1994 

[drow  dcol]=size(data) 
if  dcol  >  1 
data=data': 
end 

N=length(data): 
ef=dataf2:N): 
eb=datari:N-l): 
a=[l]; 

var=data'*data/N; 
for  k=l:P 
L=lengthreO: 

gamrk;)=(2=^cf*eb)/(ef*ef+eb’*eb);  Burg's  algorithm  performed  for 
tef=ef(  2:L)-conj(gam(k))*cb(2:L);  9cl  each  of  the  p  iterations 
eb=cb(l:L-l)-gani(k)*ef(l:L-l); 
ef=tef; 

a=[a;  0]<onj(gainfk))*[0;  flipud(conj(a))];  %  update  "a"  vector 
var=var’'(l-abs(gam(k))''2);  %  Update  error  variance 
end 

ARWHALE.M 

function  faw']=arwhalc(q) 

^  .AR  model  of  signal  q  using  the  reduced-rank  covariance  method 

method  reduces  rank  of  estimared  correlation  matrix  using  the  singular  value  decomposition 
Jr  User  visully  picks  number  of  singular  values  to  keep  program  plots  FFT  ofsisnal  vs  frequencvresponce 
V  of  .AR  model  of  segment 

Returns  matrix  of  a  coefficients  for  model  segments  [25  X  number  of  seomcntsl 
Written  by  LT  R.C. Bennett.  USN. 
f'r  ref  ar_covar.m  bv  LT  D.BrowTi  USN 

fr 

fr  last  modified  90ct94 

n=[l:512]': 

fs=l: 

findex=l:512/2: 


9c  \  Ensures  the  data  is  a  column  vector 


9c  Initial  forw'ard  prediction  error  vector 
9c  Initial  backward  prediction  error  vector 
9c  Initial  "a"  vector 
9c  Initial  error  variance 
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freq=findex*fs/5 1 2; 


Mo=25;  %%%%  model  order 

%%%  data  is  a  multiple  of  512  segments 

numseg=(floor(length(q)/512));  %%%  number  of  segments 

disp(['  the  number  of  segments  are  ',num2str(numseg)]) 
aw=zeros(Mo+l  .numseg); 

for  seg=l  inumseg  %%  recursion  to  go  through  all 

%%  segments  of  one  "whistle" 

disp([’segment  ',num2str(seg)]); 

%%%%%%%%%%% 
segment=q((seg*5 12)-5 1 1  :(seg*5 12)); 

%%%%%%%%%%  call  svd_cov.m 
[a,b]=svd_cov(segment,Mo); 

[H,W]=ffeqz(b,a,512, 'whole'); 

Par=abs(H); 

MagPar=20*logl0(Par+eps); 
aw(:,seg)=  a; 

%%%% 

fg=abs(fft(segment,5 1 2)); 

SS=20*log  10(fg+eps) ; 

%SS=SS(l;length(SS)/2+l); 

subplot(2,l,2),plot(freq,MagPar(findex),'g',fireq,SS(findex),'b') 

title(['Frequency  Response  of  Model  Segment  ',num2str(seg),'  and  Segment  Spectral  Content ']) 
xlabelC  Sampling  Frequency ') 
ylabelCMagnatude  dB') 

end  %%%end  of  for  loop 

$VD_COV.M 

function  [ahat,b]=svd_cov(x,Mo) 

%%%  function  ahat=svd_cov(x,Mo) 

%%%  compute  the  AR  coefficients  of  the  data  vector  y 
%%%  Mo  is  the  maximum  model  order. 

%  This  method  combines  the  covariance  method  and  the  svd 

%  method  of  rank  reduction  to  model  a  segment  of  biological  data.  This  method 

%  reduces  the  noise  by  eliminating  the  singular 

%  values  assodiated  with  them 

%Written  by  LT  R.C.Bennett,  USN 

%  last  modified  90ct94 

%  ref  AR_COVAR(x,P)  by  Dennis  Brown, 

%  ref  Therrien.Disaete  Random  Signals  And  Statistical 
%  Signal  Processing,  1992,  s  9.4  &  10 

%  default  returns 
a=[];s=[];R=[];X=[]; 
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^  figure  out  if  \vc  have  a  vector 
if  min(si7e(x))~=l. 

errorCInput  vector  arg' x"  must  be  an  NXl  or  an  IXN  vector  ’)* 
end:  ^cif 

^reshape  vector  to  an  Nxl 
x-x(:): 

compute  power  in  data 
pd=  sum(x.*x): 
len=Iength(x): 

normalize  energy 

x=x7sqrt(pd): 

pad  the  data  vector  out 
x=[x:  zeros(Mo,l)]: 

^c^c^c^c^c^c^cfonn  the  data  matrix  X 
X=  zeros(length(x),M(>fl); 
t=  length  (x): 
for  k=l:McH-l 
X(;.k)=:x: 
x=[xa:t);x(l:t-l)]: 
end:  ^for 


take  only  non  zero  padded  values 
X=X(Mcvf  l:length(x)-Mo,:); 


estimate  correlation  matrix 

R=:X'*X: 

[m.n]=size(R): 

solve  using  SVD  version  of  normal  equations 
^conjugate  R 
R=flipudrflipIrfR)): 

^  solve  for  a  coefficcints 
Rx=Rr2:m,2:n); 
p= -(RG:m,l)): 

[U.S.Vj=svd(Rx): 

lamda=diag(S); 

figureri). 

subplotG,!,]),  stemriamda.’b')  want  to  sec  the  singlar  values 

^c9r9c  of  the  signal  so  to 

title<'[  Singular  Values  of  Segment '])  ^^^separate  the  signal  from  the  noise 
princip=ginput:  9c9c9c  pick  max  number  of  singular  values 

^[princip]=find(lamda  >  0.5): 

"^P^^^^^P^iticip)  9c9c9c  want  to  invert  only  principle  coomponents 

pc=f!oor(princip(  LI)) 

^Iamda=lamda-princip(L2):  ^^r9r subtracts  noise  power 

9r99r99c9c9c9f9r9c  For  psuedo  inverse 
s=  L/lamdaf  Lpe): 
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Spi=diag(s); 

RXi=  V(:,l:pc)*Spi*U(:,l:pc)';  %%%  invert  only  principle  components 

%%%%%%%%%%%  a  coeficients 


a=RXi*p; 

ev=(R(l ,  1  )+sum(a.  *R(  1 ,2:n))); 
pev  =ev/len; 
aliat=[l;a]; 

%%%%%%%%%%  generate  model,  compute  power 
model  =  filter(l,ahat,[l;zeros(len-l,l)]); 
pm  =  sum(modeL*model); 


%%%  error  variance 
%%%  prediction  error  variance 
%%%  add  aO  =  1 


%%%%%%%%%%  compute  gain  (bO  in  AR  Model)  Note  This  is  a  Fudge  following  D.Brown  &  prof 
Fargues 

b  =  sqrt(pd/pm); 

%%%%%%%%%%  return  as  a  row  vector 


%ahat  =  reshape(ahat,l,length(a)); 
%b  =  reshape(b,l, length  (b)); 


LMSALE.M 

function  [w,y,e]=lmsale(x,M,mup,delay) 

%LMSALE  Adaptive  least-mean  square  line  enhancer. 

%  [W,  Y,E]  =  LMSALE(X,M,STEP,DELAY)  implements 

%  an  adaptive  line  enhancer  using  the  least-mean 
%  squares  approach  where  X  is  the  input  signal, 

%  M  is  the  number  of  filter  weights,  STEP  is 
%  the  percentage  of  the  maximum  step  size  to  use 
%  in  computing  the  next  set  of  filter  weights  and 
%  DELAY  is  the  number  samples  to  delay  X  in 
%  computing  the  new  weights.  The  final  filter 
%  weights  are  returned  in  W,  the  estimated  signal 
%  in  Y  and  the  error  signal  in  E. 

% 

%  The  maximum  step  size  is  computed  as 
% 

%  maxstep  =  2/(M  *  std(x)^2); 

% 

% 

%  [W,Y,E]  =  LMSALE(X,WIN,STEP,DELAY)  uses  the 

%  vector  WIN  as  the  initial  guess  at  the  weights. 

%  The  number  of  weights  is  equal  to  the  length 
%  of  WIN. 

%  LT  Dennis  W.  Brown  3-10-93 
%  Naval  Postgraduate  School,  Monterey,  CA 
%  May  be  freely  distributed. 

%  Not  for  use  in  commercial  products. 

%  default  returns 
y=[];w=[];e=[]; 
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Tr  check  number  of  input  args 
if  nargin  -=  4 

errorr'Imsalc:  Invalid  number  of  input  arguments../); 
end 

^  figure  out  if  we  have  a  vector 
if  minfsizefx))  1. 

errorrimsale:  Input  arg  "x"  must  be  a  IxN  or  Nxl  vector.'): 

end; 

work  with  Nxl  vectors 
X  =  xf:); 

initial  weights 
wO  =  zeros^Ml); 

if  nargin  <  5 

pi  =  0; 

end: 

some  parameters 


Ns=length(x); 
runs  =  std(x)^2: 

^  compute  maximum  step  size 
mumax  =  2/(M  *  runs); 

make  mu  less  than  2/lamdamax  using  percentage 
mu  =  mup/100  mumax: 

^  start  with  initial  guess  for  w^eights  equal  to  the  null  vector 
w  =  wO: 


recursively  compute  the  weights  to  three  significant  figures 
y=zeros(Ns.  1 );  %  space  for  filtered  signal 

e^zerosfNs.l);  9c  space  for  error  signal 

xi  =  [zerosfM-1.1)  ;  x(l:M+delay.l)]; 

9r  initial  conditions  set  to  zero 
for  k~dclay+l:M+delay-l 

b=  flipud(xi((k-M+l:k)-dclay+M-l)); 

9c  compute  filter  output 
y(k)  =  w'  b: 

9  compute  error 
efk)  =  x(k)  -  y(k): 

9c  compute  new  weights 
w  =  w  +  mu  *  b  *  e(k): 
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end 


%  rest  of  data 
for  k=M+delay:Ns 

b  =  flipud(x((k-M+l:k)-delay)); 

%  compute  filter  output 
y(k)  =  w’  *  b; 

%  compute  error 
e(k)  =  x(k)  -  y(k); 

%  compute  new  weights 
w  =  w  +  mu*b*  e(k); 


end 


ECOEFF2.M 


function  [Ew,Ell]  =  Ecoeff2(signal) 

%  Ecoeff2.m  generates  the  energy  per  scale  of  the  wavelet  decomposed  signal 
%  using  Coiflet,3  orthogonal  wavelet  transformation  This  program  is  also  used  compare  wavelet 
%  coefficients  by  changing  the  Wave_Type  parameter. 

%  Written  by  LT  R.  C.  Bennett 
%  last  updated  10/16/94 

%  calls  Wave_Type.m,  MakeONFilter.m  and  FWT_PO.m  from  Wavelab  toolbox 
% 


Wave_Type  =  'Coiflef ;  par=3; 

signal  =signal/std(signal); 

qmf=  MakeONFilter(Wave_Type,par); 


Ew=[];Ell=[]; 

L=2;  %  course  level 
wsig=FWT_PO(signal,L,qmf); 

[n,J]=dyadlength(wsig); 

forj=:J-l:-l:L 


%  normalizes  the  energy  of  the  signal  to  1.0 
%  calls  Teach  Wave  MakeOnFilter  to  build 
%  the  QMF  Filter  Bank 


%  Transforms  signal  returns  vector  of  all 
%  wavelet  high  pass  coefficents 
%  dyad  is  an  octave  index  power  of  two 


%  average  energy  in  high  pass  coefficients 
Ew=[Ew,sum(wsig(dyad(j)).'^2)Aength(wsig(dyad(j)))]; 


%In verse  transform  for  low  pass  coefficents 

l(:,J-j)=zeros(n,l); 

l(dyad(j)J“j)=wsig(dyad(j))’;  clear  temp 
temp=rWT_PO(l(:,J-j)o,qmf); 

11(:  J--j)=temp;  %  identify  low-pass  coefs 

Ell= [EIL  sum  (temp.  ^2)/length  (temp)];  %  Ip  coefs  energy  per  scale 


end 


%  for  j 
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INWVLET.M 

function  [E\\.  Ell)  =  InWvlet(signal) 

‘^InWvlct.m 

function  to  segment  data  into  lengths  of  512  samples, 
and  call  Ecoeff.m  to  get  the  energy  per  wavelet  scale 
^  Written  by  LT  R.C.  Bennett 
last  updated  160ct94 
x=signal; 

num,seg=flooraength(signal)/512);  number  of  segments 

di.sp([’  Number  of  segments  are  ',num2str(numseg)]) 
EwrizerosfT.numseg); 

Ell-zeros(7.numseg); 


forseg=  l;numseg 
disp(['segment  num2str(seg)]) 

X 1  l=x((seg*5 1 2)-5 1 1  :(seg*5 1 2))'; 
call  Ecoeff2.m 
[ew.ell)=Ecoeff(xll); 
[ew.elI]=Ecoeff2(xl  1); 

ew=ew'; 

ell=cir: 

Ew(:.seg)=ew; 

ElK:.seg)=elI; 

end 


%%recursion  to  call  Ecoeff  for  all  segments  of  signal 


9c  9c  for  seg  loop 
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SHENDWT2.M 


%  This  MATLAB  function  calculates  an  approximation  of  the 
%  DISCRETE  WAVELET  TRANSFORM  of  a  function 
%  using  SHENSA’S  ATROUS  ALGORITHM  discribed  in  "The 
%  Discrete  Wavelet  Transform:  Wedding  the  A  Trous  and 
%  Mallat  Algorithms,"  IEEE  Transactions  on  Signal 
%  Processing,  Vol.  40,  No.  10,  Oct.  1992. 

% 

%  The  function  syntax  is: 

%  [W,beta,nu]=shendwt2(s,  M,  P_i,  P_f,nc) 

%  where:  "s"  represents  a  vector  of  data  to  be  transformed 
%  "M"  is  an  integer  indicating  the  number  of  "voices" 

%  tu  be  used  to  cover  the  frequency  spectrum 

%  "W"  is  an  array  containing  the  coefficients  of 

%  the  magnitude-squared  wavelet  transform 

%  of"s." 

%  "P_i"  represents  the  first  wavelet  transform  scale 

%  to  be  calculated 

%  "P_f  ’  represents  the  final  wavelet  transform  scale 

%  The  transform  is  calculated  using  a  madulated  Gaussian 
%  window  as  an  analyzing  wavelet.  If  multiple  voices  are 

%  specified,  the  projection  of  "s"  on  each  voice  will  be 

%  represented  separately.  The  Lagrange,  a  trous  interpolation 
%  filter  used  is  obtained  from  convolving  a  four-point 
%  Daubechies  scaling  filter  with  itself. 

%  Written  by  N.A.  Hamlett,  1993  [16] 

%  Modified  by  M.P.  Fargues  1994 

function  [W,beta,nu]=shendwt2(s,  M,  P_i,  P_f,nc) 


% 

%  First,  the  argument  vector  "s"  is  conditioned.  If  "s"  is 
%  defined  as  a  column  vector,  it  is  converted  to  a  row  vector. 

%  Secondly,  "s"  is  zeropadded  to  the  next  integer  power  of  "2." 

% 

%iopt=input(’option  to  use  log  scale:  y/n  1/0  ’); 
iopt=0; 


[rows,cols]=size(s); 

% 

if  cols  ==  1 

s=s.'; 

end 

clear  rows;clear  cols; 

% 

s=zeropad(s,  1 ,2^ceil(log(length(s))/log(2))); 

% 

%  Default  values  are  imposed  for  starting  and  ending  scales  if 
%  none  are  specified. 

% 

if  existCPJ’)  ==  0 
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PJ=1: 

end 

if  cxist('P_r)  -=  0 

P_f=floor(log(length(s))/log(2)); 

end 

If  the  number  of  voices  is  not  specified,  a  default  value  of 
fr  M=:2  is  imposed. 

if  existf’M')  ==  0 
M=2; 
end 

Next  the  analwJng  wavelet  must  be  calculated.  The  starting 
^  point  of  this  process  is  to  specify  the  Gaussian  window  rolloff 
factor  "beta”  in  accordance  with  the  specified  number  of 
voices.  (Shensa  (6.31)).  If  M^l.  the  value  of  "beta”  is  defined 
as  ”pi/(4*sqrt(2)).” 

^Tefopt^inputfautomatic  selection  of  beta  and  k:  y/n  (1/0):  '); 

fopt=0: 
if  fopt==0 

^  beta=input(’beta:  ’); 
beta=  .15: 
nu=input(nu:  '); 
nu=  .85: 

if  nu“0.nu=pi-sqrt(2)*beta:end 
else 

ifM  ==  1 
9c 

beta=pi/(4*sqrt(2)); 


9c  beta=l/(2^M); 

The  location  of  the  center  frequency  "nu"  is  assigned  in  accordance 
^  with  Shensa  (6.27). 

end 

Tfnu=pi-sqrtf2)*bcta; 

end 

•Tf  The  voice  scaling  factor  "a"  is  calculated  usins  Shensa  (6  32) 
a=2'f]/M); 


‘7-  The  region  of  support  for  the  analvvang  wavelet  filter  "g"  is 
approximated  as  the  region  for  which  the  Gau.ssian  window 
is  greater  than  10^-3.  Consequently,  the  filter  impulse 
Cf  rcspon.sc  domain  is  [-a^(M-l)*.sqrt(14)/bcta.  a''(M-l)*sqrt(14)/bcta]. 
*7-  The  length  of  "g"  is  represented  bv  an  odd  integer 

n=-ceil(a%\M)*.sqrt()4)/bcta):ccil(a''(M-l)*sqrt(14)/bcta): 
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% 

%  The  analyzing  wavelet  is  calculated  for  each  voice  in  accordance 
%  with  Shensa  (6.32).  It  is  then  nonnalized  such  that  its  peak 
%  passband  frequency  response  is  unity. 

% 

fork=l:M 

% 

g(k,:)=exp(j*nu*(n/(a^(k-l)))).*exp(-(beta^2*(n/a^(k-l)).^2)/2); 
%  g(k,:)=g(k,:)/max(abs(freqz(g(k,:), 1,512))); 

end 

clear  n; 

% 

%  A  variable  "G"  indicating  the  half-length  of  filter  "g"  is 
%  defined. 

(j=ceil(iength(g)/2) 

% 


%  Next,  the  Lagrange  interpolation  filter  is  calculated.  The  filter 
%  is  obtained  from  "auto-convolution”  of  a  Daubechie  four-point 
%  DWT  filter. 

% 

f=[l+sqrt(3)  3+sqrt(3)  3-sqrt(3)  l-sqrt(3)]/(4*sqrt(2)); 

% 

f=conv(f,fliplr(f)); 

f=f/sqrt(2); 

% 

%  A  variable  "F"  indicating  the  half-length  of  "f '  is  defined. 

% 

F=ceil(length  (f)/2) ; 

% 

%  The  recursion  is  next  executed  according  to  Shensa  (2.12a  &  b). 
%  The  sequence  is  filtered  and  decimated  the  first  "P_i"  times. 

% 

fork=l:P_i-l 

% 

s=conv(s,f); 

% 

s=s(F:2:length(s)); 

% 

end 

% 

%  The  output  matrix  "W"  is  initialized  as  a  zero  vector  of 
%  dimensions  identical  to  those  of  "s." 

% 

W=zeros(size(s)); 

% 

fork=P_i:P_f 

% 

%  The  data  vector  "s"  is  first  filtered  with  each  voice  of  "g," 

%  The  squared  magnitude  of  the  result  is  assigned  to  the  appropriate 

%  column  of  ”W." 

% 

for  n=(k-P_i)*M-hl:(k-PJ+l)^M 
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Tr  The  row  of  "W"  to  be  evaluated  is  initialized  as  zero 

r. 

W(n.:)=zcros(sizcfW(T:))); 

The  magnitude  of  the  filter  output  is  calculated 
Tr  and  assigned  to  "WT.” 

\\T=abs(conv(s,g(n-(k-PJ)*M,:))); 

^^T_ph=  conv(s.g(n-(k-PJ)*M,:)); 

The  elements  of  Wk”  are  assigned  to  the  corresponding 
^  columns  of  the  row  of  "W"  being  calculated. 


rrW(n.2"'(k-PJ):2^(k-PJ);Iength(W))=Wk(G;length(Wk)-G+l); 

if  iopt==0 

W(nJ:2"'(k-PJ):lengthGV))=Wk(G:length(Wk)-G+l); 

else 

forkl=G:lengthrVW 
if  \VTrkl)<eps.  'WT(kl)=-100; 
else  \Vk(kl)-]0*Iog(Wk(k]));end 
end 

\V(n,l:2^(k-PJ):lcngthGV))=Wk(G:length(Wk)-G+l); 

end 

end 


’’s"  is  filtered  with  the  lagrangc  interpolation  filter  and  then 
decimated. 

s=conv(s.O: 

s=s(F:2:length(s)-Fk 

end 

^  plot 
^figure. 

[mw,nw]=sizerW); 

^rmesh(rot90(absrVV))) 

‘^ylabeirtime') 

^xlabeK'octave') 

figure 

contourGV.nc.  1  :n  w,(0:mw- 1  )/M) 
s=['D\\T  a-trous  ’,num2str(M),'  voices’]: 

^^-titlers); 

x!abeirtimc’):\iabelf  octave’) 


LOADW3V7tM 

fr  Loads  feature  extracted  data  .  in  this  instance  the  data  from  A  Trous  7  voices  per  scale  data, 
fr  was  used  in  all  cases  to  load  data,  append  the  target  data  and  write  in  NeuralWorks 
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%  *\nna”format.  This  version  losds  the  testing  file,  the  training  file  used  same  format  exceptloaded  the 
%  first  86filesfor  each  classification  of  signal. 

%  loadw3v7t.m 
%  Written  by  LT  R.C.  Bennett, 

%  8DEC94 

clear  %  load  feature  extracted  data  matrix 

load  ew3v7.dat; 

load  gw3v7.dat; 

load  hw3v7.dat; 

load  kw3v7.dat; 

load  sw3v7.dat; 

load  pw3v7.dat; 

e=ew3v7(:, 87:137); 

g=gw3v7(:,87: 137);  %scale  each  input  to  be  same  size  per  whale 

k=kw3v7(:,87:137); 

s=sw3v7(:,87:137); 

p=pw3v7(:,87:137); 

h=hw3v7(:,87:137); 

ae=size(e); 

ag=size(g); 

ah=:size(h); 

ak=size(k); 

as=size(s); 

ap=size(p); 

whales=[  s  k  h  g  p  e]'; 

%  Targets  %%  appends  the  known  classification  to  the  end  of  the  train/test  file 

targetl=:zeros(6,as(  1 ,2)); 
target  1  ( 1 ,  :)=ones(l  ,as(  1 ,2)); 
target2=zeros(6,ak(  1,2)); 
target2(2, :  )=ones(  1  ,ak(  1 , 2)) ; 
target3=zeros(6,ah(l,2)); 

target3(3,:)=ones(l,ah(l,2)); 
target4=zeros(6,ag(l  ,2)); 

target4(4,:)=ones(l,ag(l,2)); 

target5=zeros(6,ap(l,2)); 

target5(5,:)=ones(l,ap(l,2)); 
target6=zeros(6,ae(l  ,2)); 

target6(6, :  )=ones(  1  ,ae(  1 ,2)) ; 
targets=[targetl  target2  target3  target4  ... 

target5  targetb]; 
targets=targets'; 
x=[whales  targets]; 

%%%%%%%%%%%%%  rewrite  in  .nna  format 
x=x'; 

[nlines,  ncols]=size(x); 

N_iines=floor  (nlines/6) ; 

N_end=nlines-NJines*6; 

%%%%%  write  .nna  file 
fid=fopen('w3v7t.nna’,’w’); 
for  kp=  lincols 
%  first  line 

fprintf(fid/  %11.8f  %11.8f  %11.8f,  x(l,kp),  x(2,kp),  x(3,kp)) 
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fprintfrfid.'9cn.8f  ^cll.8f  %11.8AnV.  x(4.kp),  x(5.kp).  x(6,kp)) 


for  kl=l:NJines-l  felines  2  to  end 

fprintf(fid.'&  9cU.Sf  ‘Trll.Sf  1.8f,x(6*kl+l.kp).x(6*kl+2.kp).x(6*kl+3.kp)) 
fprintfCfid.’  9cll,8f  rrl].8f  %11.8An\r\x(6*kl+4,kp).x(6*kl+5,kp).x(6*kI+6,kp)) 
end  ‘Tfforkl 

if  N_end  -=0  9c  complete  for  the  last  line 

kl=N_lincs; 

if  Mend  >=  1.  fprintf(fid.'&  %11.8f,x(6*kl+l,kp));end 
if  Mend  >=  2.  fjjrintfffid.’  1.8f,x(6*kl+2,kp));end 
if  Mend  >=  3.  fi)rintf(fid.’  %11.8f,x(6*kl+3,kp));end 
if  Mend  >=  4.  fprintf(fid.'  ^fl].8f,x(6*kl+4.kp));end 
if  Mend  >=  5.  fprintf(fid.’  ^rll,8f,x(6*kl+5.kp));end 
end 

fprintffrid.’\n\r'); 
end  ^forkp 
fcloseffid) 

ImuTiie  w3v7t.nna  a: 

^leject 


SPECT.M 


function  [GG,X]==spect(N_voic.Max_sc.bcta,nu_pi) 

^  N_voic:  number  of  voices 
^  Max_sc:  maximum  scale 

^  beta  and  nufin  fraction  of  pi)  as  defined  in  Shensa 

^  Written  by  M.P.  Fargucs  1993 

function  [GG,X]=spcct(N_voic,Max_sc.beta,nu_pi) 

9^clg 

ip=inputrplot? 

M=N_voic:Nmax=Max_sc;nu=nu_pi*pi: 
clear  G  GG 
nsamp=  1 00;nover=50; 
nsampt=nsamp+2*novcr; 

Ntoi=Nmax*nsampt:  NS=M*Nmax: 
for  i(}=l:Nmax 
i2=2"'(i0-l): 

for  i=-nover:nsamp+nover-l 
ii=i+novcr+l: 

omeea=pi/(2^i0)  +  (pi/(2^i0))/nsamp*(i); 

for  k=l:M 

kk=k-l: 

tcmp=expf-(i2'^2^(kk/M)^^omcga  -nu)"^2/(2*bcta^2) ); 
ksamp=(iO-l)=^M+k; 

G('NS+l-ksamp.Ntot-iO*nsampt+ii)=i2*sqrt(2*pi)*2''(kk/(M))*temp/bcta' 

end 

end 

end 

1=0: 

for  k=l:Nmax-l 
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fork2=l:M 

temp=dec(  G(M*(k-l)+k2,nsampt*(k-l)+l:nsainpt*k),2^(Nmax-k)); 
temp2=G(M*(k“l)+k2,nsainpt'^(k-l  )+l  :nsampi*k); 
nl=length(temp); 

if  k==:l,  GG(k2,l:nl)=temp;  10=(nsamp+nover)/(2^(Ninax-k)); 
else 

ll=l-nover/(2^(Nmax>k)); 

GG(  M*(k-l)+k2,ll+ 1  :ll+nl)=lemp;10=nsaiiip/(2^(Nmax-k)); 
end 
end 
1=1+10; 
end 

nl=length(G(M*(Ninax-l)+l:M*Ninax,nsainpt*(Nmax-l)+l:nsainpt*Nmax-l)); 

ll=l-nover; 

GG(M*(Nniax-l)+l  :M*Nmax41+l  :ll+nl)=G(M*(Nmax-l)+l:M*Ninax,nsainpt*(Nniax- 
1 )+ 1 :  n  sampt*Nmax- 1 ) ; 

naxis=leng  th  (GG) ; 

%  xl=2^(-(Nmax-l));x2=l+(nover-l)/(2*nsamp); 
xl=2^(-(Ninax))*(l-(nover/nsamp));x2=l+(nover-l)/(2*nsainp); 
for  q=l:naxis 

X(q)=x  1 + (q>  1 )  *  (x2-x  1  )/naxis; 
end 


X=X/2; 

%subplot(211),plot(G’) 

if  ip==:l 

%subplot(212) 

plot(X,GG') 

title([’A  Trous  ImpL:’,num2str(M),'  voice(s)  -  beta:',num2str(beta)/  nu:’,num2str(nu_pi),‘pi']) 

xlabel(‘normalized  frequency’) 

ylabel(’magnitude’) 

end 


CALLOUTM 

%  callout.m 

%  Written  by  LT  R.C.  Bennett 
%  8DEC94 

%  loads  NN  output  files  for  display  and 
%  statisics  for  Chapt  5  for  mean  and  std  of  output  nodes  of  NN. 
%  calls  outnnr.m 
clear 

load  c:\nw2v50\w3v7t.nnr  %  non-orthogonal  7  voices 
load  c:\nw2v50\w3v4t.nnr  %  non-orthogonal  4  voices 
load  c:\nw2v50\w3v5t.nnr  %  non-orthogonal  5  voices 
load  c:\nw2v50\w3v6t.nnr  %  non-orthogonal  6  voices 

load  c:\nw2v50\tar.nnr  %  AR  model  coefficients  alone 

load  c:\nw2v50\ttle_5.nnr  %  ALE  AR  models 

load  c:\nw2v50\wvl2tle.nnr  %  ALE,  Wavelet,  Coiflet  3 
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load  c:\n\v2v50\\\'\*leiLle.nnr  %  ALE,  Wavelet.  Syininlet  8 
load  c;\n\v2v50\\\'\'lcttst.nnr  ^  Wavelet  alone.  S^mmlet  8 
load  c:\n\v2v50\\\'\'hv2tst.nnr  ^  Wavelet  alone.  Coiflet  3 
load  c:\n\v2v50\botha\vts.nnr  AR  &  Wavelet,  Coiflet  3 
load  c:\n\v2v50\btstw2.nnr  9c  ALE,  AR,  &  Wavelet.  Coiflet  3 

figure  9cl  Voices  Per  Octave 

\vaterfal(\v3v7i(:.  7;  12)’),  titlc(’Distribution  of  Neural  Net  Output  File'). 

ylabeirClass’) 

xlabeirXest  File  Number’). 

axis([0  max(size(\v3v7t))  0  6  min(min(w3v7t))  max(max(w3v7t))]),  view([23,82]) 

^[mn.sd]=outnnr(\v3v7t) 

print  7voice  -deps 

Ccrigure.meshc(mnf:.7:12)).  title(’Mcan  of  \v3v7l') 
figure  9c4  Voices  Per  Octave 

\vaterfal(\v3v4t(:,  7:12)'),  title(’Distribution  of  Neural  Net  Output  File’), 

ylabeirClass') 

xlaboKTest  File  Number’) 

axis<^[0  max(sizc(w3v4t))  0  6  min(min(w3v7t))  max(max(\v3v7t))]),  view([23,82]) 

^[mn.sd]~outnnr(\v3v4t) 

print  4voice  -deps 

^rfigure.meshc(mn(:.7:12)).  titIe('Mean  of  \v3v7t’) 
figure^5  Voices  Per  Octave 

\vaterfaK\v3v5t(:,  7:12)’),  title('Distribution  of  Neural  Net  Output  File'), 9c 9c 9r  9c 9c 

xlabeKTest  File  Number') 

yiabel('Class’). 

axisr[0  max(si7.e(\v3v5t))  0  6  min(min(w3v5t))  max(max(\v3v5t))]).  view([23.82]) 

^[mn.sd]=outnnr(\v3v5t) 

print  5voice  -deps 

Cc figure. meshc(mn(:,7: 12)),  titleCMean  of  w3v7t') 
figured 6  Voices  Per  Octave 

\vatcrfairw3v6tr:,  7:12)'),  titlcCDistribution  of  Neural  Net  Output  File’), 
xlabcK'Test  File  Number’) 
ylabel  (’Class'). 

axis(^fO  max(sizc(\v3v6t))  0  6  min(min(w3v7t))  max(max(\v3v7t))]).  vie\v([23,82]) 
^[mn.sd]=outnnr(\v3v6t) 

Crrigure.meshc(mn(:,7:12)).  litleCMcan  of  w3v7t') 
print  6 voice  -deps 

figure  ^tar  AR  Coefficients 

water  fair  tar  (:.  7:12)’).  titlcCDistribution  of  Neural  Net  Output  File’), 
xlabeirTest  File  Number’) 
ylabel  r'Class') 

axisrfO  max(sizcrtar))  0  6  min(min(\v3v7t))  max(max(w3v7t))]),  vicw([23,82]) 

^7[nm.sd]=outnnr(tar) 

print  tar  -deps 

^rrigure.mcshcrmn(:.7:12)).  titleCMcan  of  vv3v7t') 
figure  9c  ttlc__5  ALE  AR  Coefficients 

watcrfal(ttle_5r:.  7:12)').  titlcCDistribution  of  Neural  Net  Output  File'). 
xlabeirTest  File  Number’) 
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ylabel('Classification’) 

axis([0  niax(size(ttle_5))  0  6  min(min(w3v7t))  max(inax(w3v7l))]),  view([23,82]) 

%[mn,sd]=outiinr(ttle_5) 

print  ttle_5  -deps 

%figvire,meshc(mn(;,7:12)),  title('Mean  of  w3v7t') 

figure  %  wvlettst  Wavelet  (Symmlet  8)  Energy  per  octave 
waterfal(wvlettst(:,  7:12)'),  title('Distribution  of  Neural  Net  Output  File'), 
xlabeK’Test  File  Number') 
ylabel('Class') 

axis([0  max(size( wvlettst))  0  6  min(min(w3v7t))  max(max(w3v7t))]),  view([23,82]) 

%[mn,sd]=oumnr3(wvlettst) 

print  wvlet  -deps 

%figure,meshc(mn(:,7:12)),  title('Mean  of  w3v7t') 

figure  %  wvlw2tst  Wavelet  (Coiflet  3)  Energy  per  octave 
waterfal(wvlw2tst(:,  7:12)'),  title('Distribution  of  Neural  Net  Output  File'), 
xlabel('Test  File  Number') 
ylabel('Class') 

axis([0  max(size(wvlw2tst))  0  6  min(min(w3v7t))  max(max(w3v7t))]),  view([23,82]) 

[mn,sd]=oumnr3(wvlw2tst) 

print  wvletCS  -deps 

%figure,meshc(mn(:,7:12)),  title('Mean  of  w3v7t') 
figure  %  wvlettle  ALE  Wavelet  (Symmlet  8) 

waterfal(wvl2tle(:,  7:12)'),  title('Distribution  of  Neural  Net  Output  File '), 

xlabeK'Test  File  Number') 

ylabel('Class') 

axis([0  max(size(wvl2tle))  0  6  min(min(w3v7t))  max(max(w3v7t))]),  view([23,82]) 
%[mn,sd]=oumnr3  (wvlettle) 
print  wvlsSale  -deps 

%figure,meshc(mn(:,7:12)),  title('Mean  of  w3v7t') 
figure  %  wvl2tle  ALE  Wavelet  (Coiflet  3) 

waterfal(wvl2tle(:,  7:12)'),  title('Distribution  of  Neural  Net  Output  File  '), 

xlabeK'Test  File  Number') 

ylabeK'Class') 

axis([0  max(size(wvl2tle))  0  6  min(min(w3v7t))  max(max(w3v7t))]),  view([23,82]) 

%[mn,sd]=oumnr3(wvl2tle) 

print  wvlC3ale  -deps 

%figure,meshc(mn(:,7:12)),  title('Mean  of  w3v7t') 
figure  %  bothawts  AR  &  Wavelet  (Coiflet  3) 

waterfal(bothawts(:,  7:12)'),  title('Distribution  of  Neural  Net  Output  File'), 

xlabeK'Test  File  Number') 

ylabeK'Class') 

axis([0  max(size(bothawts))  0  6  min(min(w3v7t))  max(max(w3v7t))]),  view([23,82]) 

%(mn,sd]=oumnr3(bothawts) 

print  barwvC3  -deps 

%figure,meshc(mn(:,7:12)),  title('Mean  of  w3v7t') 

figure  %  btstw2  ALE  AR  &  Wavelet  (Coiflet  3) 
waterfal(btstw2(:,  7:12)'),  title('Distribution  of  Neural  Net  Output  File'), 
xlabeK'Test  File  Number') 
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ylabeirciass') 

axis^fO  maxfsi7c(btst\v2))  0  6  min(min(w3v70)  niax(max(\v3v7t))]),  view([23,82]) 

^r(mn.sd]=outnnr3(btst\v2) 

print  abar\\TC3  -deps 

'rcfigure,meshcrmn(:.7:12)),  title('Mcan  of  w3v70 


OUTNNR.M 

^outnnr.m 

For  calculating  the  mean  and  sid  of  output  from  nnr  format 
^  for  all  A  Trous  non-orthogonal  wavelet  energy  per  voice  per  octave  coefficients 
^  Written  bv  R.C.Bennett 
r.  4DEC94 

function  [mn.sd]=outnnr(nnr) 

sperm=nnr(l:51,:); 

kilicr=nnr(52:102,:); 

humpback=nnr(103:153.:): 

gra>‘=nnr(l  54:204,:); 

pilot=nnr(205:255,:): 

earth=nnr('256:306,:); 

m  sperm =m  ean  (sperm ) : 

sdsperm=std(spcrm): 

mkillcr=mean  (killer); 

sdkiller=std(killcr): 

mhump-mean  (humpback): 

sdhump~std(humpback): 

m  gr a y=m  can  (gray) ; 

sdgray=std(gray); 

mpilot=mean(pilot); 

sdpiiot=std(piIot): 

mearth =m  ean  (ear  th ) ; 

sdearth=std(earth): 

mn=[mspenn:  mkiller;  mhump:  mgray;  mpilot;  mearth]; 
sd=[sdsperm:  sdkillcr:  sdhump;  sdgray;  sdpilot;  sdearth]; 
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